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Detecting cancer earlier through screening seems an obviously 
good idea given that stage at detection and prognosis are often 
correlated. 

Indeed, some types of cancer screening have proven very suc- 
cessful. Cervical cancer screening has drastically reduced mortal- 
ity from the disease, but also incidence because removal of 
precursor lesions has prevented the development of invasive can- 
cers. 

But cancer is complex, and our commonly used categories, 
such as “prostate cancer,” do not reflect the diversity of cases 
within them. This is a main reason that we still do not have a “cure 
for cancer,” although great progress in treatment has certainly 
been seen for several types. 

One complicating factor in relation to screening is that, per- 
haps counterintuitively, disease prevention strategies can cause 
important harms. As for any other medical intervention, these 
must be quantified and weighed against the benefits. Whether we 
should screen is not simply a question of whether screening 
“works.” 

We also cannot simply assume that cancer screening will pro- 
vide a benefit based on principle. That is why we generally need 
rigorous randomized trials prior to implementation. But after 
implementation, we must continuously monitor benefits and 
harms at the population level. This is because screening may 
behave differently outside the strict framework of a trial, but also 
because the premises for screening may be different. 
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New methods of prevention, such as the HPV-vaccine, could 
change how and who we screen. New treatment options, new 
diagnostic, or new screening tests, as well as changes in disease 
burden, may also increase or decrease the need for screening, the 
population invited, or the frequency of testing. 

We need to continuously optimize the balance between bene- 
fits and harms of our screening strategies. We also need to con- 
sider the cost-benefit balance of screening against other options 
when healthcare resources are limited. For that, we need agreed 
definitions of terms, good data, and a sound strategy to analyze 
them. That is the motivation for this book and the reason for its 
importance. 


Centre for Evidence Based Karsten Juhl Jørgensen 
Medicine Odense 

and Cochrane Denmark 

Odense, Denmark 


Cancer screening is a prominent strategy in cancer control, yet the 
ability to correctly interpret cancer screening data seems to elude 
many researchers, clinicians, and policy makers. My initial attempt 
to address that problem was to develop a short course on the assess- 
ment of cancer screening that focused on methodology and data 
interpretation. I first taught the course during the 2015 spring semes- 
ter at the Foundation for Advanced Education in the Sciences, infor- 
mally known as the National Institutes of Health Graduate School. 
As the semester went on, it became clear to me that my students and 
the larger community needed a text. I chose to call the text a primer 
as it covers the basics and is written as simply as possible. 

The primer reflects how cancer screening is perceived and 
practiced in the USA. It is best suited to those familiar with bio- 
medical research and public health practice in the USA. I expect 
it to be of use regardless of the reader’s cancer screening knowl- 
edge. Readers less familiar with the topic will want to start at the 
beginning and read straight through. Those with some experience 
may be able to read only the sections in which they are interested, 
or consult the primer for a formula or definition. Please note that 
the primer does not provide an assessment of the evidence avail- 
able for or against screening for specific cancers, except as rele- 
vant for the purpose of example. 

I encourage feedback and can be reached at marcusp@mail. 
nih.gov. 


Bethesda, MD, USA Pamela M. Marcus 
October 2019 
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Foundations 


The ability to understand cancer screening data does not require 
an extensive background in biostatistics, biology, or oncology. 
Rather, it requires clear thinking, an open mind, and knowledge of 
a small set of foundational concepts. Those concepts are presented 
in this chapter. 


1.1 Cancer 


The United States (US) National Cancer Institute’s (NCI) web- 
page, “What is Cancer?,” provides an overview of many biomedi- 
cal aspects of cancer, including its definition, how it arises, and 
how it progresses [1]. The webpage is a great resource for those 
who are starting out in cancer research. In the next two para- 
graphs, I summarize relevant topics from the webpage. 

Cancer is a complex disease [2], but for the purpose of this 
primer, it is sufficient to conceptualize it using its most notable 
features: abnormal cells whose division is usually unchecked. 
Tumors are collections of those cells. Tumors are classified by 
their ability to metastasize, that is, the ability of their cells to 
spread to other regions of the body. Tumors that do not and never 
will have metastatic potential are called benign, though they can 
kill by growing large enough to interfere with the proper function- 
ing of organs. Tumors that can or have metastasized are called 
malignant. Malignant tumors are said to be invasive because they 
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have broken through the basement membrane, the barrier struc- 
ture on which those cells normally sit. The disruption of that 
membrane allows cells to utilize the circulatory or lymph systems 
as routes to spread. Precancer refers to cells that have not broken 
through the basement membrane but are abnormal in some way 
that suggests they could break through in the future given the right 
(though generally unknown) circumstances. The terms precursor, 
pre-invasive, and pre-malignant sometimes are used instead, but 
precancer will used in this primer. Strictly speaking, the word 
cancer (minus the prefix) refers only to malignant tumors and will 
be used as such in this primer. Be aware, however, that the word 
cancer often is used in conjunction with precancer. For example, 
cervical cancer screening rarely leads to the detection of malig- 
nant disease; instead, it usually leads to the detection of early cel- 
lular changes that are consistent with our understanding of the 
natural history of cervical cancer. 

Cancer is not one disease; it is many diseases. Cancer behavior 
differs, for example, by and within organ site, by the type of cell 
that gave rise to the tumor, and by DNA mutations found in the 
tumor cells. Treatment and prognosis often vary by these charac- 
teristics. In the past it was assumed that all cancer would be fatal 
if left untreated, but we know now that some tumor types regress, 
stall, or grow so slowly that they are of no clinical relevance. 

It is expected that about 1.8 million people in the US will be 
diagnosed with cancer in 2019, and about 607,000 will die of the 
disease [3]. A little under half the deaths will be due to cancer of 
the lung and bronchus (143,000), colorectum (51,000), female 
breast (42,000), prostate (32,000), and cervix uteri (typically 
referred to as cervix; 4300). Cancer screening activity in the US is 
focused on those five organ sites, although screening for other 
organ sites does occur, often in high-risk populations. 


1.2 Cancer Statistics 


The first step in characterizing the extent of any public health 
problem is to collect data. In the US, our go-to source for can- 
cer data is the Surveillance, Epidemiology and End Results 
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Program, known world-wide simply as SEER [4]. SEER was 
established by the 1971 National Cancer Act [5] and has pro- 
vided authoritative data on US cancer incidence, survival, and 
mortality for the years 1975 and later. SEER collects data on 
every cancer in 19 geographic areas, covering about 34% of 
the US population [6]. SEER data are available in both sum- 
mary and raw form [7-9]. 

Cancer data also are collected through the National Program of 
Cancer Registries, which was established by the Centers for 
Disease Control and Prevention (CDC) in 1992. Through this pro- 
gram, high-quality cancer registry data has been collected for 
97% of the US population and Puerto Rico, the US Pacific Island 
Jurisdictions, and the US Virgin Islands [10]. 


1.3 Cancer Screening 


Cancer screening refers to routine, periodic testing for signs of 
cancer among individuals who have no symptoms. It is a form of 
secondary prevention. In the context of cancer screening, the goal 
of secondary prevention is to improve outcomes by shifting stage 
at diagnosis to one that is less advanced and deleterious, relative 
to what occurs in the absence of cancer screening. 

Cancer screening is a sorting process. Screenees are sorted into 
two groups: those with a negative test and those with a positive 
test. A negative test finds nothing suspicious for cancer and does 
not require additional medical attention. A positive test reveals 
something that is suspicious for cancer or with unknown signifi- 
cance regarding cancer; it requires additional medical attention, 
referred to as diagnostic evaluation. That process is intended to 
definitively determine whether cancer is or is not present, but in 
practice can range from active surveillance to the removal of an 
abnormality. Active surveillance (sometimes called watchful 
waiting) refers to a schedule of minimally- or non-invasive testing 
to monitor for clinically important changes. Resection of an 
abnormality is considered diagnostic evaluation rather than treat- 
ment if a definitive diagnosis has not yet been made or cannot be 
made otherwise. 
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Cancer screening is not intended in and of itself to provide a 
definitive diagnosis. Its intent is to identify abnormal medical 
conditions, such as growths, occult blood, or a biomarker that 
may suggest cancer. Cancer screening aims to lead to the detec- 
tion of cancers whose prognosis will improve with earlier detec- 
tion, and it needs to lead to the detection of enough of those 
cancers to make screening a worthwhile public health activity. 
Cancer screening is neither intended to nor is able to lead to detec- 
tion of every cancer, as the natural history of cancer is erratic, 
technology has limitations, and frequent screening is impractical. 

In the United States, lung and prostate cancer screening tend to 
detect invasive cancer and not precancer. Screening for colorectal 
and breast cancer leads to the detection of invasive cancer and 
precancer. Cervical cancer screening leads to the detection of pre- 
cancer, certain human papilloma virus (HPV) infections (the 
causal agent), and on occasion invasive cancers. Cervical cancer 
screening also can detect cellular changes that occur very early in 
the cancer process. Those abnormalities are classified as precan- 
cer in this primer. 

The reader may come across the phrases early detection and 
early diagnosis in discussions of cancer screening and wonder 
how the two differ. Early diagnosis refers to a strategy of symp- 
tom awareness to lead to a change in the time of diagnosis. The 
phrases symptom-aware detection and symptom-vigilant detec- 
tion are more descriptive than early diagnosis but are rarely used. 
Early detection comprises early diagnosis and screening. Other 
phrases that can be confusing are cancer prevention screening and 
early detection screening. Cancer prevention screening refers to 
cancer screening that leads to the detection of precancer, and early 
detection screening refers to cancer screening that leads to the 
detection of invasive cancer. 

Principles of early diagnosis will not be discussed in this 
primer. The remainder of this primer, with the exception of Chap. 
8, is written for the assessment of early detection screening, 
though the material is equally applicable to cancer prevention 
screening in nearly all instances. Any material that is not is noted 
as such. 
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1.4 Population-Based Cancer Screening 


Population-based cancer screening refers to a cancer control prac- 
tice in which all individuals who meet certain minimal criteria can 
choose to receive cancer screening. The term population-based is 
intended to connote that nearly everyone — that is, almost the 
entire eligible population — is targeted for cancer screening. 
Sometimes the phrase mass cancer screening is used instead. 

In population-based cancer screening, individuals who are eli- 
gible for screening are offered a relatively standard screening 
regimen, standard in terms of the test and frequency. Population- 
based screening regimens are not intended for individuals who are 
at extremely elevated cancer risk due to an unusual exposure or a 
personal or family history of cancer. When we speak of population- 
based cancer screening, we exclude the aforementioned individu- 
als, because these individuals usually employ a more intense 
screening regimen than that employed in population-based cancer 
screening. These individuals are a very small fraction of the entire 
population. 

The focus of this primer is population-based cancer screening, 
but principles regarding methodology and assessment still apply 
when screening individuals at unusually elevated cancer risk. 
Individuals at that level of risk may weigh benefits and harms of 
cancer screening differently than those at average risk. Oftentimes 
more intense cancer screening regimens are offered to individuals 
at extremely elevated cancer risk. For those individuals, the term 
surveillance, rather than screening, typically is used. 

The phrase population-based often will be excluded as a modi- 
fier of the phrase cancer screening in this primer when it is clear 
that population-based cancer screening is under discussion. The 
phrase is excluded for reasons of conciseness. Therefore, the 
reader should assume that population-based cancer screening is 
being discussed unless otherwise noted. 

Readers who are interested in the features of ideal population- 
based disease screening programs can consult Principles and 
Practice of Screening for Disease, published in 1968 by Wilson 
and Jungner [11]. 
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1.5 Choosing the Cancers for Which 
We Screen 


Population-based cancer screening occurs routinely in the US for 
five cancers that, in the absence of screening, typically present as 
invasive cancer: female breast, cervical, colorectal, lung, and 
prostate. We screen for these cancers because their invasive forms 
can lead to morbidity and premature mortality. We also screen 
because there is evidence, or in some instances suspicion, that 
cancer screening is beneficial. The fact that cancer screening is 
recommended by professional organizations or has become estab- 
lished in community settings does not necessarily mean that con- 
clusive evidence of a benefit exists. Adoption of unproven cancer 
screening tests has occurred in the US and elsewhere. 

This primer will not delve into the evidence that supports (or 
does not support) population-based cancer screening for the five 
aforementioned cancers. Many well-respected and up-to-date 
resources for that information already exist [12, 13]. This purpose 
of this primer is to teach the reader how to assess and interpret 
cancer screening through the use of data, not to provide a review 
of literature on the benefits and harms of screening for specific 
cancers. 


1.6 Choosing Who to Screen 


Consideration of who to screen begins with identification of the 
factors that are known to meaningfully increase cancer risk. Next, 
prevalence of the risk factors is considered. Sufficient risk and 
sufficiently prevalent risk factors are necessary to affect an 
absolute reduction in cancer morbidity and mortality that is large 
enough to justify population-based cancer screening (assuming, 
of course, that cancer screening is of benefit). Population-based 
cancer screening is a resource-intense cancer control method and 
generally is not used for rare cancers. 

Age is the strongest risk factor for adult cancer and as such 
cancer screening recommendations are based on that factor. For 
lung cancer screening, smoking history also is a criterion because 
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of its prevalence and strong association with the disease. We do 
not screen males for breast cancer or never smokers for lung can- 
cer because the chance of individuals in those groups developing 
the respective cancers is very low. The day may come when age is 
augmented by genomic or other biologic information to drive can- 
cer screening recommendations, both for and against screening. 
We are not yet in that era of personalized cancer screening for 
individuals at average risk, however. 

We choose to screen those for whom we believe the benefit 
outweighs the harm, though we can only assess that for a popula- 
tion, not for an individual. The term individual refers to the person 
who is offered screening, while the term population refers to the 
entire group of individuals who have been offered screening. At 
the population level, we can examine changes in beneficial out- 
comes and harmful experiences with the advent of screening. At 
the individual level, we can never know who will or did benefit 
from screening, as we do not know what will happen or what 
would have happened in the absence of screening. 


1.7 The Cancer Screening Process 


Cancer screening cannot result in benefit without the successful 
completion of other components of the screening process, which 
encompasses all activities that lead up to and come after applica- 
tion of the screening test. The screening process begins when 
potential screenees are notified of the option to be screened and 
ends, at the earliest, when results of the screening test are relayed 
to the screenee. For those who receive a positive result, the process 
will extend to diagnostic evaluation and may include cancer diag- 
nosis and treatment. 

The resources that are needed to carry out a successful cancer 
screening effort include more than just those required to adminis- 
ter the cancer screening test. Consideration of resources employed 
in population-based cancer screening must include, at a mini- 
mum, those associated with screening invitation, assessment of 
eligibility, informed decision making, test interpretation, report- 
ing of results, and diagnostic evaluation and cancer treatment as 
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needed. Other considerations include time and wages lost by indi- 
viduals who are attending screening, and other manners, perhaps 
more critical or economically efficient, in which screening 
resources could be used. Readers who would like to learn more 
about the screening process can consult Zapka et al. [14] and 
Beaber et al. [15]. 


1.8 Cancer Screening Tests 


Cancer screening tests also are known as cancer screening modal- 
ities. The screening tests we use in the US are either image-based 
or biospecimen-based. Imaging tests are used for breast cancer 
(mammography, digital tomosynthesis), colorectal cancer (sig- 
moidoscopy, colonoscopy, virtual colonography), and lung cancer 
(low dose computed tomography (LDCT)). Biospecimen-based 
tests are used for cervical cancer (pap smear, HPV testing), 
colorectal cancer (fecal occult blood testing (FOBT)), and pros- 
tate cancer (prostate-specific antigen (PSA)). 

Some cancer screening tests also are used as diagnostic tests. 
Colonoscopy is used as a colorectal cancer screening test as well 
as for evaluation of symptoms or follow-up of a positive FOBT. A 
positive PSA screening test may lead to serial PSA tests to moni- 
tor for changes in PSA. The term indication refers to the reason 
for performing a test. 


1.9 Organized Screening Programs Versus 
Opportunistic Screening 


Cancer screening practices vary from country to country. Reasons 
include cultural differences, differing interpretations of evidence, 
and varying public health needs. Central to these choices, how- 
ever, is the manner in which health care is administered and deliv- 
ered. Organized screening programs are found in countries with 
nationalized health care, a setting in which a government body 
decides on the best medical practices, including cancer screening, 
and offers and administers, free of charge, only those services 
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deemed appropriate. Infrastructure usually exists to facilitate 
screening and to manage the experiences of those individuals who 
receive a positive screening result. Opportunistic screening occurs 
in the US and in other countries without nationalized health care. 
Opportunistic screening provides more choice, but individuals 
typically are left on their own to navigate the process. Opportunistic 
screening also occurs in countries with organized screening pro- 
grams if the primary care physician arranges it or the screenee 
requests it, but in some jurisdictions the costs of the test must be 
borne by the individual. 

The methods described in this primer can be used to interpret 
data from organized or opportunistic screening settings. Readers 
who wish to learn more about organized screening can consult 
Raffle and Gray’s Screening: Evidence and Practice [16]. 


1.10 Benefit Versus Harm 


Assessment of cancer screening tests can be contentious because 
disagreements exist regarding what constitutes benefit, what con- 
stitutes harm, and how to balance the two. We can measure and 
have measured the impact at a population level by looking for 
reductions in cause-specific mortality rates. Cause-specific refers 
to the cause of death that we aim to prevent by cancer screening. 
Reduction in cause-specific incidence rates is employed for tests 
that detect precancer and will be discussed in Chap. 8. As the 
reader will learn, a reduction in cause-specific incidence rate will 
lead to a reduction in cause-specific mortality rates in most 
instances. 

Cause-specific mortality rates are unable to reflect any harms 
other than those that affect length of life or cause of death. Yet 
there are many potential harms of screening, including psycho- 
logical impact of screening results, diversion of resources away 
from other health care needs, and late effects (also known as 
downstream effects) of diagnostic evaluation or cancer treatment. 
These harms often are difficult to measure, difficult to attribute to 
the screening process, and vary by screenee. Nevertheless, they 
are real, and metrics need to be developed that can incorporate 
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them so the net impact of population-based cancer screening pro- 
grams can be measured. 

Benefits and harms can occur at an individual level or a popu- 
lation level. Individual-level harms are more perceptible than 
population-level harms, but it is at the individual level that the 
trade-off between benefit and harm is most murky. Acceptable 
benefit-to-harm ratios differ by individual, because fear of cancer, 
risk tolerance, risk illiteracy, and other factors vary from person to 
person. 

Reduction in cause-specific mortality rates remains the stan- 
dard by which most organizations and researchers judge the ben- 
efit of population-based cancer screening programs, as it reflects 
advances in reducing the rates of cancer death, as well as exten- 
sion of life among those who die of the disease. Lack of a reduc- 
tion in cause-specific mortality is typically interpreted to mean 
that cancer screening does not result in benefit. 

Breast and colorectal cancer screening have been shown to 
reduce cause-specific mortality in randomized controlled trials 
(RCTs), though the tests examined in those trials are now out- 
dated. Newer tests have become the cancer screening standard of 
care, based on those tests’ improvement in performance measures 
(Chap. 3) relative to the previous and RCT-tested cancer screen- 
ing standard of care, and without evidence that the newer tests 
reduce cause-specific mortality rates. The methodological issues 
involving the adoption of a newer test based on a comparison with 
the current standard of care test are discussed in Chap. 9. 


1.11 Efficacy and Effectiveness of Cancer 
Screening 


Efficacy refers to the ability of cancer screening to reduce cause- 
specific mortality rates in an experimental setting. Effectiveness 
refers to the ability to affect the same reduction in a community 
setting, one in which individuals choose whether to be screened as 
part of their usual health care. Ideally, efficacy is studied first, and 
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the cancer screening test does not disseminate into community 
settings until it is known to be efficacious. 

Efficacy does not guarantee effectiveness. Given their rigor 
and intense oversight of patient experiences in experimental set- 
tings, efficacy studies are considered to provide the best-case 
scenario regarding cancer screening’s ability to reduce cause- 
specific mortality rates. In community settings, failures in the 
screening process, such as delayed communication of screening 
results, inadequate diagnostic evaluation, and lack of access to 
appropriate cancer treatment can hinder the realization of a 
cause-specific incidence or mortality reduction. However, cancer 
screening can be effective even in the presence of challenges and 
imperfections. 


1.12 Cancer Screening: Turning Healthy People 
Into Cancer Patients 


Individuals who present for cancer screening are healthy for all 
intents and purposes; neither they nor their doctors have any rea- 
son to believe they have cancer. A fraction of those screened will 
be diagnosed and become cancer patients. The diagnosis may lead 
to prevention of death from cancer. However, it may reflect screen- 
detection of a cancer that never would have been life-threatening. 
To say the latter is unfortunate is an understatement. Cancer is a 
disease that significantly affects every aspect of life. 

There is evidence that screening for breast, lung, cervical, 
colorectal, and perhaps prostate cancer reduces cause-specific 
mortality relative to the absence of screening, even if there is dis- 
agreement regarding the extent of benefit or for whom the benefit 
exists. In addition to possible benefits, potential screenees need to 
be informed of the possible harms when the option of cancer 
screening is raised. Some individuals may opt out of cancer 
screening; for them, the possible harms outweigh the possible 
benefits. The choice is reasonable, as it reflects what matters to 
them. 
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Behind the Scenes 2 


Cancer screening aims to interfere with disease progression by 
detecting cancer at a point in its natural history when it is either 
curable or, if not curable, when treatment will extend life beyond 
what it would have been in the absence of cancer screening. The 
phrase screen detected and similar terms are used in this chapter 
but are a bit of a misnomer, as cancer screening tests are not the 
final arbiter of the presence of cancer. Screen detected is intended 
to mean that cancer screening initiated a process that led to the 
diagnosis of a cancer. 


2.1 A Simple Model of the Natural History 
of Cancer 


The natural history of cancer is complex and for the most part not 
well understood. Furthermore, there is great variability, even 
among tumors of the same organ. Figure 2.1 depicts how cancer 
progresses. It is an gross over-simplification of the process but it 
is a useful aid in explaining how cancer screening aims to inter- 
fere in the disease process. 

Figure 2.1 displays four phases of cancer progression that are 
relevant to cancer screening. Cancer is present, asymptomatic, 
and not yet detectable by screening in Phase A. In Phase B (previ- 
ously known as the detectable pre-clinical phase or DPCP), cancer 
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Cancer is Cancer is Cancer is Death due to 
present present present cancer 
Asymptomatic Asymptomatic Symptomatic 
Undetectable Detectable 
with screening with screening 
Phase A Phase B Phase D 


Fig. 2.1 Four phases of cancer progression 


is still asymptomatic but has characteristics that should, in the 
best of all possible worlds, make it detectable through cancer 
screening. Examples of characteristics are size and shedding of 
tumor cells that could be detected in a biospecimen. A cancer in 
Phase B may not be screen detected, however; the individual may 
not be screened or the test may give an inaccurate result due to its 
limitations. In Phase C, cancers come to clinical attention due to 
symptoms. Phase C includes cancers that are curable as well as 
those that are not. In Phase D, cancer causes death. 

It is important to note that Phases A and B are a function of the 
cancer screening test. A cancer may be classified as being in Phase 
B if a technologically advanced test is used to screen, but in Phase 
A otherwise. For example, some lung cancers that can be detected 
with low dose computed tomography (LDCT) screening would 
not have been detectable with traditional two-dimensional chest 
X-ray screening given chest x-ray’s poorer resolution and capture 
of substantially less radiologic information. At a specific point in 
time, a cancer could be in Phase B if the cancer screening modal- 
ity is LDCT and Phase A if the cancer screening modality is chest 
x-ray. Use of chest x-ray screening for lung cancer was common 
in the later decades of the 20th century but is no longer standard 
of care. 

The purpose of the four phase model is not to demonstrate all 
possible paths that cancer or an individual with cancer can experi- 
ence. It assumes that all cancers progress through each phase and 
do so only in a forward fashion, even though we know that some 
cancers regress or stall. It assumes that all cancers would be fatal 
if not treated, though experience tells us otherwise. It assumes 
that death due to causes other than the cancer of interest cannot 
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occur. Even with those exclusions, the model is useful in 
conceptualizing our goals in cancer screening and provides a 
vocabulary that helps us discuss cancer screening. 

Cancer screening attempts to shift the diagnosis of Phase C 
cancers to Phase B. Cancer screening is predicated on the belief 
that treatment at Phase B is more likely to lead to cure or exten- 
sion of life than treatment at Phase C. If treatment at Phase B 
offers no prognostic advantage over treatment at Phase C, cancer 
screening will not lead to a reduction in cause-specific mortality. 
Treatment options may be more palatable, however, for Phase B 
cancers, and morbidity associated with the cancer and its treat- 
ment may be reduced. Then again, a diagnosis that occurs at an 
earlier time leads to more time spent as a cancer patient, which 
has psychological and clinical implications as discussed in the 
Benefits and Harms section of Chap. 1. 


2.2 Three Important Phenomena in Screen 
Detection of Cancer 


Lead time, length-biased sampling, and overdiagnosis are three 
terms that are used frequently in the assessment of cancer screen- 
ing. They refer to the shift to an earlier date of diagnosis with 
cancer screening (lead time) and selection of a prognostically 
favorable (and thus non-random) sample of cancers (length- 
biased sampling and overdiagnosis). The remainder of this chap- 
ter explains these phenomena, while Chap. 5 describes how they 
complicate interpretation of cancer screening data. 

The phrase length-biased sampling becomes awkward when 
we speak of bias due to length-biased sample. The phrase length 
time, which is sometimes used instead of length-biased sample, 
isn’t a better choice as it is not particularly descriptive. The phrase 
length-weighted sampling will be used instead in the remainder of 
this primer. 

In the remainder of this primer, the phrase three cancer screen- 
ing phenomena refer to lead time, length-weighted sampling, and 
overdiagnosis. 
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2.2.1 Lead Time 


Screen-detected cancers are diagnosed at an earlier point in time 
than they would have been in the absence of cancer screening. 
Lead time is the amount of time by which the diagnosis date was 
advanced by cancer screening. It is that shifting of the diagnosis 
date to an earlier time that leads many to refer to cancer screening 
as early or earlier detection. 

Lead time for an actual individual is impossible to calculate 
because we cannot know what the date of symptomatic diagnosis 
would have been in the absence of cancer screening. But for the 
purpose of illustration, we will pretend that we know that date. 
Lead time would be 3 months if, in the absence of cancer screen- 
ing, a cancer would have been detected on June 1, 2018, but, in 
the presence of cancer screening, was diagnosed on March 1, 
2018. 


2.2.2 Length-Weighted Sampling 


The term length-weighted sampling refers to the fact that the 
chance of screen detection is dependent on the length of time 
(sometimes referred to as sojourn time) the cancer remains in 
Phase B. The term sampling is used because cancer screening is 
merely a selection of some cancers (those that are screen detected) 
from the pool of all cancers. In elementary probability classes, 
sampling is often demonstrated using a jar of marbles. If the mar- 
bles are all of the same size, each marble has the same chance of 
selection. If the marbles are of different sizes, the chance that a 
given marble is selected is determined by its size, with chance of 
selection positively correlated with size of a marble. Cancer 
screening is similar to the latter situation; as the reader will see, 
one particular tumor characteristic, time spent in Phase B, drives 
the chance of detection. 

Recall that not all cancers can be detected through screening. 
The chance that a cancer will be screen detected is dependent to 
differing degrees on many factors, including cancer characteristics, 


2.2 Three Important Phenomena in Screen Detection of Cancer 19 


screenee characteristics, the cancer screening test, and screening 
interval. Screening interval refers to the amount of time between 
screens. An annual screening interval is used (or was used until 
we learned more about cancer progression) for a number of can- 
cer screening tests. 

Most notably, the chance of screen detection is inversely asso- 
ciated with the speed of tumor growth: faster growing cancers 
spend less time in Phase B and consequently have less time to be 
screen detected. A cancer that spends only 3 months in Phase B, 
for example, will have no opportunity to be detected on an annual 
screen, unless the annual screen happens to occur during that 
three-month window. A cancer that spends 2 years in Phase B will 
have two annual screens on which it can be detected. Cancers with 
a longer Phase B are assumed to be slower growing than those 
with a shorter Phase B, and cancers that are slower growing are 
assumed to have better prognosis. Therefore, length-weighted 
sampling leads to detection of cancers through screening that are 
expected to have better prognosis than those that are not detected 
though screening. 

The term weighted is used to mean that the sample of cancers 
detected through screening will be skewed in favor of cancers 
with more favorable prognosis. In other words, the cancers that 
screening detects do not represent a random sample of all cancers. 

Figure 2.2 depicts the fictional experience of three screenees, 
Mike, Molly, and Mary. The example demonstrates the interplay 
of screening interval and length of Phase B. 

Mike, Molly, and Mary have Phase A cancers at the time of the 
first screen (month 0), so none of the three has cancer detected. 
Each cancer enters Phase B at month 6. Mike’s cancer enters 
Phase C at month 9, leading to a symptom-driven diagnosis prior 
to the second screen. Molly’s cancer is in Phase B at month 12, 
the time of her second screen, and the cancer is screen detected. 
Mary’s cancer also is in Phase B at month 12, but her cancer is 
missed at the second screen. Because Mary’s cancer is still in 
Phase B at the time of her third screen (24 months), there is 
another opportunity to detect it through screening, and that 
happens. 
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phase B 


0 3 6 9 12 15 18 21 24 27 
Month 


Fig. 2.2 The interplay of Phase B length, a one-year screening interval, and 
cancer diagnosis. X indicates cancer diagnosis. Experience is fictional 


What would have happened had a different screening interval 
been used? A six-month interval could have led to screen detec- 
tion for Mike, because the shorter the screening interval, the more 
likely that cancers with short Phase B lengths will be detected 
through cancer screening. The impact of a two-year interval on 
Molly’s experience depends on the length of Phase B: if it had 
been less than 18 months, Molly’s cancer could not have been 
screen detected. 


2.2.3 Overdiagnosis 


Overdiagnosis is the detection through cancer screening of can- 
cers that never would have been diagnosed in the absence of can- 
cer screening. These are cancers that, in the absence of screening, 
would not progress beyond Phase B during the lifetime of the 
screenee. The existence of overdiagnosis in cancer screening was 
once quite controversial, but nowadays many researchers and cli- 
nicians are open-minded to the possibility that some screen- 
detected cancers are diagnosed unnecessarily. Many also believe 
that screening for cancer of any site would result in overdiagnosis. 
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The biggest controversy in overdiagnosis today surrounds its 
magnitude, which is further discussed in Chap. 9. 

Overdiagnosis includes, but is not restricted to, the detection of 
indolent cancers. Indolent cancers are those that are screen 
detected (by definition, in Phase B), but in the absence of cancer 
screening and even in the longest of lifetimes, would have 
remained in Phase B, regressed to Phase A, or completely 
resolved. The screen detection of indolent cancers is an extreme 
example of length-weighted sampling. Because overdiagnosis 
refers to screen-detected cancers, cancer detected as a result of 
symptoms (Phase C cancers) cannot be overdiagnosed, even 
though they too, theoretically, can stall, regress, or resolve. 

Non-indolent screen-detected cancers can be overdiagnosed if 
death, due to a cause other than the screen-detected cancer, occurs 
between the date of screen detection and the hypothetical date of 
symptom detection. The phrase competing cause of mortality is 
used to describe this scenario. In Fig. 2.3, John’s cancer is not 
overdiagnosed because he is alive on the day that the cancer would 
have been detected due to symptoms. James, however, dies soon 
after his cancer is diagnosed, and he is not alive on the day that it 
would have been detected due to symptoms had he lived. If not 


John: not an overdiagnosed cancer 


1/1/19 71/19 12/1/19 


Screen Cancer diagnosis Death due to 
detection (hypothetical) in heart attack 
the absence of 
screening 


James: an overdiagnosed cancer 


1/1/19 71/19 12/1/19 


Screen Death due to Cancer diagnosis 
detection heart attack (hypothetical) in 
the absence of 
screening 


Fig. 2.3 Overdiagnosis due to death from other causes. Experience is 
fictional 
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screen detected, James’ cancer would have been in Phase B when 
he died, and neither he nor his health care providers would have 
known of its existence. 

Today’s technology generally does not allow for the classifica- 
tion of a tumor as clearly indolent or not. In some instances, tumor 
characteristics can suggest a likely course, be it an innocuous or 
highly aggressive one. Death from a different cause soon after 
screen detection may suggest overdiagnosis associated with a 
competing cause of mortality, while death long after argues 
against it. Of course, uncertainty always exists because the life 
course an individual would have experienced in the absence of 
cancer screening is unknowable. 
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Performance Measures 3 


Performance measures reflect the link between cancer screening 
test results and cancer diagnoses. They provide no information 
about cause-specific mortality. Performance measures are used in 
the initial assessment of proposed cancer screening tests and also 
are used to monitor performance once cancer screening has dis- 
seminated. There are six key performance measures, with each 
interpretable as a probability (ranging from 0 to 1) or percentage 
(ranging from 0% to 100%). 

Performance measures are calculated from the experience of 
individuals who have been screened. The cancer screening test 
result and whether cancer was present at the time of the screen 
need to be available for each individual to calculate performance 
measures. 


3.1 The Building Blocks of Performance 
Measures 


3.1.1 Cancer Screening Test Result 


The cancer screening test result is classified as either positive or 
negative. A positive result indicates a suspicion of cancer and the 
need for diagnostic evaluation. A negative result indicates no sus- 
picion of cancer and no need for diagnostic evaluation. The 
definition of a positive test result is not etched in stone; instead, 
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the medical community makes recommendations as to what con- 
stitutes a positive test. In practice, any abnormality deemed suspi- 
cious by the test interpreter is called positive, regardless of 
whether it meets the recommended definition of a positive test. 
For many cancer screening tests, particularly those than employ 
imaging, it is impossible for recommendations to include every 
finding or constellation of findings that creates a suspicion for 
cancer. 

Recommendations are made after many factors are weighed, 
including the burden of positive tests and the gravity of missing a 
cancer. Medical communities may arrive at different recommen- 
dations. In the US, for example, a prostate-specific antigen (PSA) 
blood level of 4.0 ng/mL or higher is typically considered a posi- 
tive test for prostate cancer, but in parts of Europe, a value of 
3.0 ng/mL or higher is used. 

At the extremes, there tends to be agreement as to whether a 
cancer screening test result should be classified as positive or neg- 
ative. For example, a large spiculated lung mass observed on low 
dose computed tomography (LDCT) would be classified as posi- 
tive for lung cancer, while a mammogram that shows only the 
anatomic structures of the breast would be called negative for 
breast cancer. The challenge comes when it is not obvious what a 
finding represents: a result that isn’t exactly negative and isn’t 
exactly positive. There is a move towards classifying these grey- 
zone findings as indeterminate and employing a less intense and 
usually non-invasive form of diagnostic evaluation. Some may 
disagree with use of the phrase diagnostic evaluation in the 
instance of indeterminates, as the recommended medical inter- 
vention is intended to watch for change in the abnormality rather 
than determine whether it is cancer. In that instance, the term 
monitoring can be used. For the purpose of calculating perfor- 
mance measures, I classify indeterminate cancer screening test 
results as positive. In my opinion, any cancer screening test that is 
not negative is positive, as it leaves uncertainty in the mind of the 
clinician and screenee. 

Some biospecimen-based cancer screening tests return a 
numeric value or other quantitative measure. These values 
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correlate with the chance of the presence of cancer. PSA is one 
such test. A value greater than 4 ng/mL is usually considered a 
positive result in the United States, but active surveillance rather 
than biopsy is often recommended if the PSA is between 4 ng/mL 
and 10 ng/mL. A value of 10 ng/mL or greater, however, typically 
leads to imaging or biopsy. Other biospecimen-based cancer 
screening tests, such as cervical cytology, indicate whether abnor- 
mal cells are present. One form of cervical cancer screening, 
human papilloma virus (HPV) testing, indicates whether certain 
cancer-causing strains of HPV are present rather than indicating 
whether an abnormality suspicious for cancer is present. 

Imaging-based cancer screening tests are used to determine if 
abnormalities are present. A cancer screening test will be called 
positive if an abnormality suspicious for cancer is revealed. These 
tests also can reveal abnormalities that are not suspicious for can- 
cer and abnormalities whose significance with regard to cancer is 
unknown. Lung cancer screening with LDCT, for example, can 
lead to detection of non-calcified nodules (positive if above a cer- 
tain size), calcified nodules (usually negative), or ground glass 
opacities (oftentimes of uncertain significance). Some imaging- 
based cancer screening tests also can lead to detection of abnor- 
malities that represent or are suspicious for non-cancer conditions, 
called incidental findings or incidentalomas. For example, LDCT 
screening for lung cancer can lead to the detection of coronary 
artery calcification. 


3.1.2 Cancer: Present or Not? 


Cancer is either present or not present at the time of the cancer 
screening test, though only some cancers that are present can be 
detected through cancer screening. Recall from Chap. 2 (Fig. 2.1) 
that Phase A cancers are present but not detectable, while phase B 
cancers are present and have characteristics that should make 
them detectable. Knowing whether a cancer is present and detect- 
able at the time of a cancer screening test is often not as simple as 
the four phase model implies, though. The most challenging 
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aspect is determining whether a negative screen that occurred 
prior to a symptom-detected cancer represents a true negative or a 
false negative, terms that are fairly self-explanatory and will be 
discussed later in this chapter. The following fictional scenarios 
represent quandaries that researchers face when trying to assess 
whether a Phase B cancer was present at the time of a negative 
screen: 

Amanda had a lung cancer screening test and the result was 
negative. Three months later, she receives a symptom-prompted 
diagnosis of lung cancer. Was the cancer missed on screening, or 
was the cancer in Phase A at the time of the screening but moved 
through Phase B very quickly? Did the cancer exist at the time of 
the screen? 

Arnie had a prostate cancer screening test and the result was 
positive. He received standard diagnostic evaluation and his clini- 
cian concluded that he did not have prostate cancer. Nine months 
later, he receives a symptom-prompted diagnosis of prostate can- 
cer. Was the cancer in Phase B at the time of the screen but diag- 
nostic evaluation failed in some way? Is the diagnosed cancer a 
new and fast growing abnormality. In other words, did the diag- 
nosed cancer arise from an abnormality other than the one that 
prompted the positive result? 

Astrid schedules her screening mammogram. Two days before 
the test, she finds a breast lump but does not tell anyone. Her 
mammogram is positive and diagnostic evaluation indicates that 
the lump she found is cancer. Astrid’s cancer was present at the 
time of her mammogram, but should the test be considered a 
screening mammogram or a diagnostic mammogram? 

The phrase interval cancer is used to describe cancers that 
occur between screening rounds and follow either a negative test 
or a resolved positive test. Resolved means that the conclusion of 
the diagnostic evaluation was that cancer was not present. 
Amanda’s cancer and Arnie’s cancer are interval cancers regard- 
less of whether they were in Phase A or B at the time of the screen. 
If in Phase B, the previous screening test would be classified as a 
false negative. 
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It is clear that Astrid’s cancer was present and in Phase C at the 
time of the screening test. The cancer could be classified as an 
interval cancer because it was symptomatic before the screen. 
Then again, it could be classified as screen detected because the 
screening test result was positive, even though it was beyond 
Phase B. Cancer screening tests can miss Phase C cancers, and 
that could have been Astrid’s experience. 

Most screen-detected cancers are in Phase B at the time of the 
cancer screening test. For simplicity’s sake Phase C cancers that 
are detected as the result of cancer screening will be excluded for 
the remainder of this primer. 


3.2 Calculating Cancer Screening 
Performance Measures 


The six cancer screening performance measures are sensitivity, 
specificity, positive predictive value (PPV), negative predictive 
value (NPV), false positive rate (FPR), and false negative rate 
(FNR). The receiver operating characteristic (ROC) curve is a 
graph of sensitivity versus FPR, which is equal to 1 minus speci- 
ficity. The ROC curve demonstrates how those two values vary as 
the definition of a positive test changes. Its summary measure, 
area under the curve (AUC), is calculated so that ROC curves can 
be compared. 


3.2.1 The Formulas 


Table 3.1 presents the quantities that are needed to calculate per- 
formance measures. The four quantities in the center of the table 
are at the heart of performance measure calculations. They are 
true positive tests (a), false positive tests (b), false negative tests 
(c), and true negative tests (d). True positive tests are those posi- 
tive tests that led to the diagnosis of a cancer, and true negative 
tests are those negatives tests that correctly indicated no suspicion 
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Table 3.1 The components of performance measure formulas 


Truth 
Cancer not 
Cancer present 
present (includes Phase 
(in Phase B) | A) Total 
Screening test Positive | a b a+b 
result true positives | false positives | all positives 
Negative | c d ct+d 
false true negatives | all negatives 
negatives 
Total atc bt+d a+b+c+d 
cancers cancers not all screenees 
present present 


Cancers in Phase C can be screen detected, but most screen-detected cancers 
are in Phase B. For simplicity’s sake Phase C cancers are not included as 
cancers that are screen detected 


of cancer. False positive tests are sometimes called false alarms; 
the test suggests something suspicious, but diagnostic evaluation 
reveals that cancer is not present. False negative tests are incor- 
rectly negative: cancer is present and in Phase B, but the cancer 
screening test result is negative. Because Phase A cancers cannot 
be detected by cancer screening, they are considered not to be 
present when calculating performance measures. 

The six performance measures are defined as follows. The for- 
mulas use the notation in Table 3.1. 


e Sensitivity, sometimes abbreviated as Se, is the percentage of 
people with cancer who had a positive test; a/(a + c). 

e Specificity, sometimes abbreviated as Sp, is the percentage of 
people without cancer who had a negative test; d/(b + d). 

e PPV is the percentage of people with a positive test who had 
cancer; a/(a + b). 

e NPV is the percentage of people with a negative cancer screen- 
ing test who did not have cancer; d/(c + d). 

e FPR is the percentage of people without cancer who had a 
positive test; b/(b + d). FPR equals 1 minus specificity. 
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e FNR is not typically reported but will be defined here for com- 
pleteness’ sake. It is the percentage of people with cancer who 
had a negative cancer screening test; c/(a + c). FNR equals 
1 minus sensitivity. 


Positivity and negativity rate usually are not referred to as perfor- 
mance measures, but it is important to present them nonetheless: 


e Positivity rate is the percentage of people screened who have a 
positive test; (a+ b)/(a +b +c +d) 

e Negativity rate is the percentage of people screened who had a 
negative test; (c + d)/(a + b +c + d). 


Table 3.2 presents data from the Breast Cancer Surveillance 
Consortium (BCSC) Data Explorer, a public-access database of 
mammographic breast cancer screening experience from 1994 
through 2009 [1]. These data are used in Table 3.3 to calculate the 
performance measures. 

In the BCSC example, sensitivity and specificity are fairly 
high, as is often the case with cancer screening tests that are used 
in population-based cancer screening. The manner in which a 
positive test is defined generally drives sensitivity and specificity, 


Table 3.2 Screening mammogram classification among women ages 50 to 
59 at the time of screening. Breast Cancer Surveillance Consortium Data 
Explorer, 1994-2009 


Truth 
Cancer Cancer not 
present present Total 
Screening test Positive | 7044 165,115 172,159 
result (true (false positives) 
positives) 
Negative | 1534 1,623,399 1,624,933 
(false (true negatives) 
negatives) 
Total 8578 1,788,514 1,797,092 
(all 
screens) 
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Table 3.3 Performance measures, positivity rate, and negativity rate: formu- 
las and calculations using data from Table 3.2 


Performance Formulas using Table 3.1 Calculations using 
measure notation Table 3.2 data 
Sensitivity alla +c) 7044/8578 
true positives/cancers 82% 
present 
Specificity di(b + d) 1,623,399/1,788,514 
true negatives/cancer not 91% 
present 
PPV ala + b) 7044/172,159 
true positives/all positives 4% 
NPV di(c +d) 1,623,399/1,624,933 
true negatives/all negatives | >99% 
FPR bi(b + d) 165,115/1,788,514 
false positives/cancer not 9% 
present 
also equal to | — specificity 
FNR cla+c) 1534/8578 
false negatives/cancers 18% 


present 
also equal to 1 — sensitivity 


Positivity rate (at+b\(at+b+c+d) 172,159/1,797,092 
all positives/all screened 10% 
Negativity rate |(c+d)(at+b+c+d) 1,624,933/1,797,092 


all negatives/all screened 


90% 


as do the capabilities and limitations of the cancer screening test 
itself. The definition of a positive screen is chosen so that most 
cancers are found (high sensitivity) and the absolute number of 
false positives is kept as low as possible (low FPR, or high speci- 
ficity). NPV is high as well but PPV is very low. 


3.2.2 The Relationship Between PPV, NPV, 
and Prevalence 


PPV and NPV are driven by sensitivity and specificity, and they 
also are driven by the prevalence of disease. PPV and NPV can be 
calculated from sensitivity, sensitivity, and the prevalence of dis- 
ease using the formulas in Box 3.1. 
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Box 3.1 Calculating PPV and NPV from sensitivity (Se), 
specificity (Sp), and prevalence 


PPV= (Se x prevalence) 
/ (Se x prevalence + (1 -= Sp) x (1 -= prevalence)) 


NPV = (Spx (1 -= prevalence) 
I (Sp x (1 = prevalence) + (1 -= Se) x prevalence) 


The Box 3.1 PPV formula indicates that PPV always will be 
low in the instance of a rare disease (low prevalence) because the 
numerator will be substantially smaller than the denominator. The 
Box 3.1 NPV formula indicates that NPV always will be high in 
the instance of a rare disease because the numerator and denomi- 
nator will be nearly the same. Those statements are true because 
the quantity (Se x prevalence) will be close to zero when preva- 
lence is low. Table 3.4 presents, using the BCSC sensitivity (82%) 
and specificity (91%), values of PPV and NPV for a range of 
prevalence values. The annual prevalence in the BCSC cohort is 
approximately 500 per 100,000 women. In Table 3.4, notice that 
PPV increases as prevalence increases, but it takes an implausible 
prevalence, 100 times that of the prevalence observed in the BCSC 
cohort (50,000 per 100,000 women), for PPV to rise to 90%. A 
prevalence of 50,000 per 100,000 women means that every other 
woman has breast cancer, something that is far from true for any 
cancer. 


Table 3.4 PPV and NPV by prevalence of disease (sensitivity of 82% and 
specificity of 91%) 


Prevalence | PPV | NPV 
250 per 100,000 | 2.2% | >99% 
500 per 100,000 | 4.3% | >99% 
1000 per 100,000 | 8.2% | >99% 
50,000 per 100,00 | 90.8% | >99% 


Data are fictional 
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Table 3.5 PPV as a function of sensitivity and specificity (disease preva- 
lence of 500 per 100,000) 


| Sensitivity 

(90% |95% |99% 
Specificity | | | 
90% | 43% | 4.6% | 4.7% 
95% | 83% | 8.7% [| 9.0% 
99% | 31.1% | 32.3% [33.2% 


Data are fictional 


Those who are new to assessment of cancer screening often are 
amazed that PPV is so low for cancer screening tests even when 
sensitivity and specificity are high. Table 3.5 demonstrates, for a 
typical cancer prevalence of 500 per 100,000, how changes in 
sensitivity and specificity affect PPV. Notice that even at a 
sensitivity and specificity of 99%, values that are yet to be 
achieved for cancer screening modalities, PPV is only 33%. The 
data in Table 3.5 demonstrate that it is virtually impossible for 
PPV to rise above 10% given typical prevalence, sensitivity, and 
specificity associated with today’s cancer screening tests. 


3.2.3 The Implications of Low PPV 


A low PPV indicates that most positive cancer screening tests are 
false alarms. A PPV of 4% means that 96% of positive tests do not 
lead to a cancer diagnosis. In the BCSC data (Table 3.2), there are 
7000 true positives but 165,000 false positives. There is disagree- 
ment as to whether false positives should be classified as a harm 
of cancer screening. One point of view is that any test, including 
the diagnostic evaluation tests that accompany a false positive, is 
a test worth having if it rules out cancer. The other point of view 
is that false positives are a harm of cancer screening as they cause 
patients to worry unnecessarily and to receive unneeded medical 
tests and procedures, some of which can be risky. 
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3.2.4 Can PPV Be Improved? 


As was demonstrated in Box 3.1, PPV depends on three quanti- 
ties: prevalence, sensitivity, and specificity. Disease prevalence is, 
for all intents and purposes, not modifiable (and definitely not in 
the short term), and while we do have some control over sensitiv- 
ity and specificity, their upper bounds are determined, realisti- 
cally, by the abilities of the cancer screening tests. So PPV will 
remain low. And cancer screening will continue to generate many 
more false than true positive tests. 

Recall that the intent of cancer screening is not to diagnose; 
rather it is to identify individuals who need additional medical 
attention to determine if they have cancer or to rule that out. A 
cancer screening test with a value of 100% for sensitivity, speci- 
ficity, PPV, and NPV would be possible if a cancer screening test 
had perfect discriminatory ability, which is contrary to the goal of 
cancer screening. We could guarantee 100% sensitivity by assign- 
ing a positive test result to every screening test, but in that instance, 
PPV will still be low: it will equal the prevalence of the cancer. 
We could guarantee 100% specificity by assigning a negative test 
result to every cancer screening test, but in that instance, no can- 
cers would be screen detected. 


3.3 ROC Curves and AUC 


An ROC curve demonstrates the trade-off between detecting more 
cancers and increasing the FPR. The curve is formed by graphing 
the sensitivity and FPR for different definitions of positivity. 
Usually an established screening cohort with information on can- 
cer diagnoses and specifics of what was observed on the cancer 
screening test (rather than only a positive/negative test result clas- 
sification) is used and scenarios are created. Prostate cancer 
screening provides a straightforward example. A PSA of 4 ng/mL 
or greater is the usual definition of a positive prostate cancer 
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screening test in the US, but what would have happened if the cut- 
off was 3 ng/mL or 5 ng/mL, say? How many additional cancers 
would be detected with the lower cut-off, and how many addi- 
tional cancers would be missed with the higher cut-off? The FPR 
would increase with the lower cut-off and decrease with the higher 
cut-off, but by how much? 

ROC curves provide useful comparisons, though it is necessary 
to make assumptions when using the scenarios. We must assume 
that the experience that follows the cancer screening test is the 
same regardless of the positivity definition employed. For exam- 
ple, we must assume that any cancer diagnosed through cancer 
screening ultimately would be symptom-detected (no overdiagno- 
sis), and we also must assume that the intensity of diagnostic eval- 
uation is the same regardless of the positivity definition employed. 
An ROC curve is built by selecting a finite number of positivity 
definitions, graphing the sensitivities and FPRs that would have 
resulted from those positivity definitions, and connecting the dots 
either in a linear fashion or by way of smoothing. 

Examples of cancer screening test ROC curves can be found in 
the biomedical literature [2—4]. For illustrative purposes, the 
BCSC data presented in Table 3.2 were used to lay the foundation 
for a fictional ROC curve. 


3.3.1 ROC Curves 


Figure 3.1 presents our fictional ROC curve. Sensitivity is plotted 
along the y-axis and the FPR is plotted along the x-axis. The ROC 
curve rises steeply as sensitivity moves away from zero, indicat- 
ing a large gain in sensitivity with only small increases in FPR. All 
ROC curves have a turning point, a point at which the incremental 
ability to improve sensitivity becomes increasingly more expen- 
sive in terms of FPR. 

All ROC curves include the points [0,0] and [1,1]; it is the path 
the curve takes from [0,0] to [1,1] that varies. [0,0] represents the 
unrealistic situation in which all test are negative, which results in 
a sensitivity of zero and an FPR of 0. [1,1] represents the 
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Fig. 3.1 ROC curve 


unrealistic situation in which all tests are positive, which results in 
a sensitivity of 1 and an FPR of 1. 

ROC curves can be created for cancer screening tests that 
return continuous measures, such as PSA, by selecting and vary- 
ing the value that defines positivity. They also can be used for tests 
that return categorical classifications, such as the BI-RADS clas- 
sification for breast abnormalities [5], by collapsing the categories 
into only two: positive and negative. Let’s say that a cancer 
screening test returns a value of 1, 2, 3, 4, or 5. To create the ROC 
curve, a positive test result could be defined as a value of 2 or 
greater, a value of 3 or greater, or a value of 4 or greater. Sensitivity 
and FPR would then be calculated for each of the three scenarios 
to create the ROC curve. 

The ROC curve (Fig. 3.1) was created using a small number of 
data points for ease of calculation and presentation. The points 
were developed by modifying the BCSC values of sensitivity 
(82%) and FPR (9%) (Table 3.3): the number of false positives 
and false negatives were varied by percentages rather than exam- 
ining test findings and reclassifying according to new positivity 
definitions. The actual point and the derived points are presented 
in Table 3.6. 
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Table 3.6 Values of sensitivity and FPR used to calculate the ROC curve in 
Fig. 3.1 


Sensitivity FPR Veracity 
0.41 0.05 Fictional 
0.62 0.07 Fictional 
0.82 0.09 Actual 
0.87 0.32 Fictional 
0.91 0.55 Fictional 
1 
0.8 
Actual 
2 0.6 
Z 
5 
D 0.4 
0.2 
0 
0 0.2 0.4 0.6 0.8 1 


False positive rate 


Fig. 3.2 Calculating AUC by partitioning the space under the ROC curve 
into rectangles and triangles 


3.3.2 Calculating AUC 


ROC curves can be summarized and compared by calculating the 
area underneath them. That area, the AUC, is circumscribed by 
the curve itself, the x-axis, and a right sided y-axis, and can be 
calculated using simple formulas for area or, if desired, integral 
calculus. The AUC for the ROC curve in Fig. 3.1 is 0.87 and was 
calculated by dividing the area into 5 rectangles and 6 triangles 
and summing those areas (Fig. 3.2). Many ROC curves presented 
in the literature are smoothed, however. Smoothing involves 
advanced mathematics, which is beyond the scope of this primer. 
Smoothing an ROC curve should change the AUC only slightly. 
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AUC ranges from 0.5 to 1.0. An AUC of 0.5 represents a can- 
cer screening test with no discriminatory ability, meaning that the 
result does not depend on whether cancer is present. The cancer 
screening test is, in effect, no better than flipping a (fair) coin to 
assign the result. An AUC of 1.0 indicates perfect discriminatory 
ability: the point [1,0] defines the curve. In that instance, sensitiv- 
ity is 1 and the FPR is 0. The points [0,0] and [1,1] are not viable 
scenarios in cancer screening, but they create standard anchors for 
the curve so that AUCs can be calculated and compared. 


3.4 Performance Measures: Evidence or Not? 


Performance measures are useful for describing the discrimina- 
tory ability of cancer screening tests and for comparing one can- 
cer screening test to another. But they measure the ability of 
cancer screening to lead to detection of cancer, not the ability of 
cancer screening to reduce cause-specific mortality. Chapter 5 
explains that improvement in cancer detection does not guarantee 
a reduction in cause-specific mortality. 

Performance measures are rarely considered sufficient evi- 
dence to implement cancer screening for the first time. However, 
a new cancer screening test, one that is similar to an established 
test known to reduce cause-specific mortality, often disseminates 
into practice if its performance measures are superior to those of 
the older test. Examples include the change from film mammog- 
raphy to digital mammography (breast cancer) [3] and the change 
from guaiac FOBT to immunochemical FOBT, also known as FIT 
(colorectal cancer) [6]. Adoption of new cancer screening tests 
based on comparison of performance measures with that of past 
tests is discussed in more detail in Chap. 8. 
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Population Measures: 4 
Definitions 


Performance measures, the subject of Chap. 3, describe the accu- 
racy of a cancer screening test and its ability to lead to cancer 
detection in a set of screened individuals. They do not, however, 
describe characteristics of the detected cancers or experience after 
diagnosis. Intermediate and definitive outcomes do reflect that 
information. Intermediate and definitive outcomes are measured 
at the population level and therefore are called population mea- 
sures. The term population refers to a group of individuals who 
are either formally offered cancer screening or for whom cancer 
screening is available. Though population can mean residents of a 
geographic region, it does not need to. Population also can refer to 
the types of research populations discussed in Chaps. 6 and 7. 
When assessing the impact of cancer screening on the population- 
level cancer burden, intermediate and definitive outcomes must 
incorporate the experience of all cancers, regardless of the method 
of detection, and include all individuals who were eligible to be 
screened. For example, assessment of intermediate and definitive 
breast cancer screening outcomes would need to be calculated 
using both screen-detected cancers and cancers diagnosed due to 
symptomatic presentation. 

Intermediate outcomes can be measured earlier in time than 
definitive outcomes. Definitive outcomes are not affected by the 
three screening phenomena (lead time, length-weighted sampling, 
and overdiagnosis) that were described in Chap. 2. Favorable 
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intermediate outcomes are necessary but not sufficient for favor- 
able definitive outcomes. However, intermediate outcomes that 
clearly are not favorable are sufficient evidence that cancer screen- 
ing will not reduce cause-specific mortality. 

Intermediate and definitive outcomes are defined in this chap- 
ter. Examples and associated calculations are presented. Chapter 
5 will address reasons why intermediate outcomes are necessary 
but not sufficient to guarantee a reduction in cause-specific mor- 
tality as well as why definitive outcomes are not affected by the 
screening phenomena. Phenomena that can lead to inaccuracies in 
assignment of cause of death also will be discussed in Chap. 5. 

The first intermediate outcome to be discussed, cancer inci- 
dence, is an intermediate outcome for early detection cancer 
screening, but a definitive outcome for cancer prevention screen- 
ing. The reasons for the different classification are discussed in 
Chap. 8. The discussion of cancer incidence in this chapter is per- 
tinent only to early detection cancer screening. 


4.1 Intermediate Outcomes 
4.1.1 Cancer Incidence 


Incidence reflects the number of cancers that are diagnosed. 
Absolute numbers of cancers can be reported, although an inci- 
dence rate is more commonly used to allow for comparisons 
across populations of different sizes or with different lengths of 
observation. A rate incorporates a unit of time or is stated per an 
amount of time. Incidence rates are calculated as the number of 
cancers diagnosed (numerator) divided by the number of persons 
or person-years at risk for the cancer (denominator). Person-years 
are merely a measure of cumulative time, and are most often used 
in characterizing data from prospective research in which partici- 
pants are monitored for different amounts of time. Incidence rates 
that do not use person-years of experience must state the unit of 
time over which the experience occurred. Most cancer incidence 
rates are age-adjusted because incidence of cancer varies by age. 
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There are many examples of cancer incidence rates in the lit- 
erature. A widely-used and well-respected source is SEER, which 
was discussed in Chap. 1. SEER has been collecting data on can- 
cer incidence, cancer mortality and other cancer outcomes in parts 
of the US since the early 1970s [1]. 

Here are two examples: 


e The SEER age-adjusted incidence rate of breast cancer in 2016 
was 129.81 per 100,000 women per year. SEER reports rates in 
that manner (rather than person-years) as its focus is on annual 
measures. The equivalent person-years rate would be 129.81 
per 100,000 person-years [2]. 

e The lung cancer incidence rate in the low-dose computed 
tomography (LDCT) arm of the National Lung Screening Trial 
(NLST) was 645 per 100,000 person-years. That rate was 
based on 1060 lung cancers and 164,341 person-years of expe- 
rience [3]. 


4.1.2 Calculating a Cancer Incidence Rate: 
A Fictional Example 


Table 4.1 presents a cancer incidence rate calculation that uses 
person-years. The data are fictional and not reflective of the typi- 
cal magnitude of cancer incidence. To calculate the incidence 
rate, experience (person-years) is truncated at the date of cancer 
diagnosis because cancer incidence is the outcome of interest. Put 
another way, cancer diagnosis is our endpoint for cancer inci- 
dence. Information on vital status, date of death, and cause of 
death are not needed yet. 

The experience of 5 individuals is presented in Table 4.1. Each 
of the five is followed from the date of his or her 50th birthday. 
Two are diagnosed with cancer. The numerator is the number of 
cancers that were diagnosed (2 in this example). The denominator 
is the sum of the person-years that each individual contributed. 
Deborah and David were diagnosed with cancer, so person-years 
equals the time from the date of the 50th birthday to the date of 
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Table 4.1 Calculating cancer incidence 


Date of 
Date Status on death or 
turned =| 55th Date of 55th Person-years 
50 birthday diagnosis | birthday | contributed 
Douglas 1/1/18 | Alive, never | N/A 1/1/23 5 years 
diagnosed 
with cancer 
Deborah | 4/1/18 | Dead, 10/1/19 6/1/21 1.5 years 
diagnosed 
with cancer 
Don 7/1/18 | Dead, never | N/A 1/1/20 1.5 years 
diagnosed 
with cancer 
David 10/1/18 | Alive, 4/1/22 10/1/23 3.5 years 
diagnosed 
with cancer 
Dudley 1/1/19 | Dead, never | N/A 1/1/21 2 years 
diagnosed 
with cancer 


Number of cancers during the 5 year period: 2 

Number of person-years: 5 + 1.5 + 1.5 + 3.5 +2 = 13.5 

Five-year cancer incidence rate among these individuals: 2/13.5, or 14.8 
per 100 person-years 


Data are fictional 


diagnosis. Don and Dudley were not diagnosed with cancer and 
died prior to their 55th birthday, so person-years equals the time 
from the 50th birthday to the date of death. Their person-years are 
truncated at the date of death because they are only at risk of can- 
cer while they are alive. Douglas was not diagnosed with cancer 
and was alive at his 55th birthday. He contributes five person- 
years, the maximum time possible in this example. Douglas was 
at risk of cancer for the entire period of observation. 


4.1.3 Stage Distribution 


A stage distribution is fashioned from a series of diagnosed can- 
cers. It presents the number and percent of cancers that have and 
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have not spread. For cancers that have spread, stage distribution 
captures the extent of spread. The predominant staging system is 
the TNM system [4]. The T value indicates the size of the primary 
tumor and the spread into nearby tissue; the N value describes 
spread of cancer to nearby lymph nodes; and the M value describes 
metastasis (spread of cancer to other parts of the body). A 
T1NOMO breast tumor is one that is invasive, smaller than 20 mil- 
limeters in its greatest dimension, and has not spread to lymph 
nodes or to other organs. 

The TNM staging system is quite extensive, and in population- 
level research, TNM codes usually are combined to create the 
summary staging categories: local, regional, distant, and if need 
be unknown. Local refers to cancer that is invasive and confined 
to the primary site, regional refers to cancer that has spread to 
regional lymph nodes, and distant refers to cancer that has metas- 
tasized. An example of a stage distribution from SEER is found in 
Table 4.2 [5]. The TNM and summary staging systems include 
categories for in situ disease (the most advanced form of precan- 
cer), though that category is not included in the example. 

The terms early stage, advanced stage, and late stage are fre- 
quently used when discussing cancer screening. Early stage gen- 
erally refers to cancers that are curable, local-stage cancers, or 
cancers that are in a relatively early phase of their existence. 
Cancer screening aims to detect those cancers. Advanced and late 
stage generally refer to distant-stage cancers, cancers that are not 


Table 4.2 Stage distribution of breast cancers diagnosed 2008 to 2014 and 
reported to the SEER 18 registry grouping 


Stage | Number’ | Percentage 
Local [213,258 62 
Regional | 106,629 [31 
‘Distant | 20,638 6 
‘Unknown |6879 2 

Total [343,965 100 


“SEER reports percentages, but not numbers, by stage. The stage-specific 
numbers in this table were calculated by multiplying the total number of 
breast cancers by the stage-specific percentages 
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curable, or cancers that are expected to be fatal. Cancer screening 
does not aim to detect cancers at late stages as their prognosis is 
unlikely to be improved. 


4.1.4 Case Survival 


Case survival is the time from cancer diagnosis to death from any 
cause, and in assessment of cancer screening is typically mea- 
sured in months or years. Individual case survival is calculated by 
subtracting the date of diagnosis from the date of death. Summary 
measures of case survival are calculated for a series of cancers. 
Case survival can be reported using medians or means, but is 
frequently reported as a percentage of cases alive after a certain 
amount of time, usually 5 years. Relative case survival is typically 
used; it is a measure that takes into account the hypothetical mor- 
tality the cancer patients would have had had they not been diag- 
nosed with cancer. Relative case survival is calculated by dividing 
the number of cancer patients alive at the end of a given period of 
time by the number of individuals in a comparable but cancer-free 
population alive after the same period of time. It is the latter group 
that represents the aforementioned hypothetical mortality. A non- 
relative case survival percentage is calculated by dividing the 
number of living cancer patients by the total number of cancer 
patients, and is smaller than a relative case survival percentage 
because it does not consider that some cancer patients would have 
died of a cause other than cancer had they not been diagnosed 
with cancer. Table 4.3 presents relative and non-relative case sur- 
vival percentages for a fictional sample of 100 cancer patients. 


4.2 Definitive Outcomes 
4.2.1 Cause-Specific and all-Cause Mortality 
Mortality reflects the number of individuals who die. As is the 


case with incidence, rates rather than absolute numbers usually 
are reported. Both cause-specific and all-cause mortality rates use 
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Table 4.3 Calculating relative and non-relative case survival 


Number Number dead | Number dead 


alive 5 years later |5 years later 
5 years because of the | because of 
Population Number | later cancer other causes 
“Cancer patients 100 85 6 9 
Individuals who 100 90 0 10 


are otherwise 
similar to the 
cancer patients 


Relative 5-year case survival: 85/90 = 94%. Non-relative 5-year case 
survival: 85/100 = 85%. 


Experience is fictional 


the number of person-years of individuals at risk of any death as 
the denominator. Cause-specific mortality rates use the number of 
individuals who died of the cause of interest as the numerator. All- 
cause mortality rates use the number of individuals who died of 
any cause. 

It is common to confuse the meanings of mortality measures 
and case survival because they both involve death. The primary 
difference between the two is that case survival includes only 
those individuals who have been diagnosed with cancer. Mortality 
measures include all individuals who are at risk of dying from any 
cause. Put another way, mortality measures include those with 
cancer as well as those without cancer. 

Figure 4.1 may help to explain. N, the number at risk of death, 
includes all individuals. C is the subset of the N individuals who 
were diagnosed with cancer. D is the subset of the C individuals 
who died of cancer. C minus D is the number of individuals with 
cancer who did not die of cancer. A mortality measure would use 
N (or their person-years of experience) as the denominator, while 
a case survival measure would use C as the denominator. As men- 
tioned above, a cause-specific mortality measure would use D as 
the numerator. Figure 4.2 is a modified version of Fig. 4.1 and 
depicts the cascade for all-cause mortality measures. 

Mortality rates reflect two measurable aspects of life: vital sta- 
tus and length of life. The use of person-years as the denominator 
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Number at 
risk for death 


(N) 


Number 
diagnosed 
with cancer 
(C) 


Number who 
die from 
cancer (D) 


Fig. 4.1 The cascade from at risk for death to cause-specific death 


enables cause-specific rates to reflect extension of life even if the 
cause of death is the cancer of interest. All-cause mortality rates 
will reflect the extension as well. 

Here are two examples of cancer mortality rates: 


e The SEER age-adjusted mortality rate of breast cancer in 2016 
was 20.03 per 100,000 women per year or 20.03 per 100,000 
person-years [2]. 

e The lung cancer mortality rate in the LDCT arm of the NLST 
was 247 per 100,000 person-years. That rate corresponds to 
356 lung cancer deaths and 144,103 person-years of experi- 
ence [3]. 
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Number at risk for 
death (N) 


Number who 
die of any 
cause (D + 

other deaths) 


Fig. 4.2 The cascade from at risk for death to death from any cause 


4.2.2 Calculating Mortality Rates: A Fictional 
Example 


Table 4.4 displays the fictional experience of 100 individuals. All 
are alive on 1/1/18, twenty die on 6/30/2018, and the remaining 
80 are alive on 12/31/18. Two pieces of information are needed to 
calculate the all-cause mortality rate: the number of individuals 
who died (numerator) and the collective amount of time that indi- 
viduals were alive (denominator). The numerator clearly is 20, but 
the denominator requires some calculation. We need to separately 
consider those who were alive on 12/31/18 and those who died 
before that date. The 80 individuals who were alive on 12/31/18 
each contribute a full year of time, for a total of 80 person-years. 
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Table 4.4 Calculating an all-cause mortality rate for 2018 


Person-years contribution for 


Number | 2018 
Those alive on 1/1/18 | 100 | Not applicable 
Those who die on 6/30/18 | 20 | 10 
Those alive on 12/31/18 |80 |80 7 


“Mortality rate: 20/(10 + 80) = 20 per 90 person-years, or 22.2 per 100 
person-years 


Data are fictional 


The 20 who died did so half-way through the year. Each contrib- 
utes half a year for a total of 10 person-years. The denominator 
equals the sum of the contributions from the two groups: 80 
person-years plus 10 person-years, or 90 person-years. The 
all-cause mortality rate is 20 per 90 person-years, or 22.2 per 100 
person-years. 

In some instances, researchers may choose to calculate a sim- 
ple percentage, reflecting the percentage of individuals who died. 
In the Table 4.4 example, the percentage of individuals who were 
alive on 1/1/18 but who died before 12/31/18 is 20%. It is smaller 
than the mortality rate and thus suggests more favorable experi- 
ence. But it does so incorrectly, as it does not take into account 
that the 20 individuals who died did so halfway through the year. 
The simple percentage treats the deaths as if they occurred on the 
last day of the year. While the Table 4.4 example generated only a 
small disagreement in the percentage and mortality rate, the two 
metrics, when calculated using larger numbers of individuals who 
have a range of life lengths, can be meaningfully different from 
one another. 

Table 4.5 expands on Table 4.4 by including information on 
cause of death. Five of the individuals who died on 6/30/18 died 
of lung cancer, while the remaining 15 died of another cause. 
When calculating a cause-specific mortality rate, only individuals 
who died of the cause of interest are included in the numerator, 
although all who died contribute the amount of time they were 
alive to the denominator. Table 4.5 shows calculations for the lung 
cancer mortality rate. The numerator is now 5, yet the denominator 
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Table 4.5 Calculating a lung cancer mortality rate for 2018 


Person-years 
Number | contribution for 2018 
Those alive on 1/1/18 100 
Those who die on 6/30/18 of lung 5 2.5 
cancer 
Those who die on 6/30/18 of a cause | 15 75 
other than lung cancer 
Those alive on 12/31/18 80 80 


Lung cancer mortality rate: 5/(2.5 + 7.5 + 80) =5 per 90 person-years, or 
5.6 per 100 person-years 


Data are fictional 


remains at 90. The numerator is smaller because not all deaths 
were due to lung cancer. The denominator is the same as that of 
the all-cause mortality rate because the same amount of time was 
lived. The lung cancer mortality rate is 5 per 90 person-years, or 
5.6 per 100 person-years. It is smaller than the all-cause mortality 
rate because our goal was to measure the rate of dying of lung 
cancer, which is less common than dying of any cause. 
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Population Measures: 5 
Cancer Screening’s Impact 


If assessment of cancer screening involved nothing more than 
calculating the outcomes described in Chap. 4, there would be 
little need for this primer. The challenging aspect is the interpre- 
tation of changes in outcomes, both intermediate and definitive, 
that accompany cancer screening. The material in Chap. 5 is 
presented in terms of a change from no population-based cancer 
screening to the establishment of population-based cancer 
screening, even though the same principles apply when an estab- 
lished cancer screening test is replaced by one with improved 
performance measures. Matters specific to the latter scenario are 
discussed further in Chap. 9. 

The three screening phenomena presented in Chap. 2, lead 
time, length-weighted sampling, and overdiagnosis, feature 
prominently in Chap. 5. The reader may wish to review that mate- 
rial prior to proceeding. 

The material on cancer incidence presented in this chapter is 
pertinent only to early detection cancer screening. The impact on 
the incidence of cancer prevention screening is presented in 
Chap. 8. 
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5.1 Cancer Screening’s Impact 
on Intermediate Outcomes 


5.1.1 Cancer Incidence 


Cancer incidence is expected to increase when cancer screening is 
introduced. Lead time results in diagnosis at an earlier point in 
time, creating a bunching effect as the shifted screen-detected 
cancers are diagnosed contemporaneously with symptom- 
detected cancers. Also adding to the increase are the overdiag- 
nosed cancers. 

Cancer screening cannot lead to a reduction in cause-specific 
mortality if cancer incidence does not increase. If cancer inci- 
dence remains stable as cancer screening is introduced and uptake 
increases, diagnoses are not occurring earlier, and therefore prog- 
nosis cannot change. An increase in cancer incidence does not 
guarantee a cause-specific mortality reduction. The increase may 
be due to detection of overdiagnosed cancers or detection of can- 
cers that would have the same prognosis regardless of detection in 
Phase B or Phase C (as defined in Chap. 2). 


5.1.2 Cancer Incidence Example 


Figure 5.1 displays, in a very simplistic manner, how introduction 
of cancer screening increases the number of cancers that are diag- 
nosed. In the absence of cancer screening, three cancers are diag- 
nosed due to symptoms in 2017 (the X cancers) and three cancers 


2017 2018 
In the absence of XXX YYY 
screening > 


In the presence of XXXYYYZZ 
screening 


Fig. 5.1 Cancer incidence in the presence and absence of cancer screening 
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are diagnosed due to symptoms in 2018 (the Y cancers). In the 
presence of cancer screening, the three X cancers are either screen 
or symptom detected in 2017, the three Y cancers are screen 
detected in 2017, and the two Z cancers, the overdiagnosed can- 
cers, also are diagnosed in 2017. The number of cancers diag- 
nosed in 2017 in the presence of cancer screening is 5 more than 
would have been diagnosed in the absence of cancer screening. If 
this fictional population included 1000 individuals, the incidence 
rate for 2017 would be 3/1000 per year in the absence of cancer 
screening versus 8/1000 per year in the presence of cancer 
screening. 

In the absence of cancer screening, X and Y cancers are symp- 
tom detected, with X cancers diagnosed in 2017 and Y cancers 
diagnosed in 2018. Z cancers are never diagnosed. In the presence 
of cancer screening, X cancers are still detected in 2017 though 
they may be screen or symptom detected. Y cancers are now 
screen detected in 2017. Z cancers (overdiagnosed cancers) are 
screen detected in 2017. Data are fictional. 

Figure 5.1 depicts what cancer screening is intended to do: 
detect cancers at an earlier point in time. Screen detection of the 
Y cancers in Fig. 5.1 may lead to more favorable experiences for 
these patients, such as simpler treatment and better prognosis. 
Their detection could lead to a reduction in cause-specific mortal- 
ity, although at the point of diagnosis, it is impossible to know. Of 
course, conjecture is possible and frequently happens. For exam- 
ple, diagnosis at an earlier point in time may be interpreted as 
advantageous, which can then be prematurely interpreted to mean 
that cancer screening will lead to a reduction in cause-specific 
mortality. 

Figure 5.1 does not depict what happens in the presence of 
cancer screening in 2018, but if the graph were extended for addi- 
tional years, the same general pattern of shifting should hold. The 
specifics of the shift depend on how cancer screening interferes 
with cancer’s natural history, other changes in cancer’s natural 
history, as well as changes in cancer screening uptake and perfor- 
mance. Barring any drastic changes in the three, the characteris- 
tics of the shift, such as degree and speed, should be fairly similar 
and stabilize after no more than a few screening rounds. 
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5.1.3 Stage at Diagnosis 


Cancer screening aims to detect cancer when prognosis is more 
favorable than it would have been if detected due to symptoms. 
Prognosis usually is related to stage at diagnosis. Most local-stage 
cancers are curable with resection, though these days, some 
regional- and distant-stage cancers can be cured with surgery, 
chemotherapy, immunotherapy, radiation, or a combination. As 
more non-local-stage cancers become curable, cancers diagnosed 
at those stages could have similar prognosis as those diagnosed at 
a local stage. But in today’s cancer world, it is fair to assume that 
cure is most likely for local cancers and that those with treated 
local cancers live the longest. 

The number of local-stage cancers is expected to increase 
when cancer screening is introduced. Soon after, a decrease in 
the number of regional- or distant-stage cancers is expected, as 
some cancers that were destined to be diagnosed at a later stage 
in the absence of cancer screening will have been detected at an 
earlier stage in the presence of cancer screening. The phrases 
stage shift and down staging are used to describe that situation. 
The phrases should be used to refer to changes in numbers, not 
changes in percentages. While it is true that a stage shift will 
lead to a change in the percentage of cancers for a given stage, 
percentages can be misleading if the number of local-stage can- 
cers increase absent a decrease in regional- and distant-stage 
cancers, which can happen when cancer screening leads to over- 
diagnosis. 

If a stage shift does not occur, cancer screening will not lead to 
a reduction in cause-specific mortality. Lack of a stage shift indi- 
cates no movement in the stage at diagnosis and thus no improve- 
ment in prognosis. But the presence of a stage shift does not 
guarantee a cause-specific mortality reduction. A stage shift 
reflecting a change from one stage to another that has similar 
prognosis would confer no reduction in cause-specific mortality. 
Length-weighted sampling could produce that situation in the 
instance of curable disease, while lead time could produce that 
situation in the instance of incurable disease. 
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Discussion of stage shifts have typically focused on the need to 
observe an increase in early-stage cancer rather than a reduction in 
late-stage cancer. But both are necessary for a cause-specific mor- 
tality reduction to be possible, and a reduction in distant-stage can- 
cer is unlikely to be due to lead time, length-weighted sampling, or 
ovediagnosis. The use of distant-stage cancer as a possible surro- 
gate for cause-specific mortality is discussed in Chap. 9. 


5.1.4 Stage at Diagnosis Example 


Table 5.1 displays fictional stage experience of the Fig. 5.1 can- 
cers in the absence and presence of cancer screening. Scenario 1 
excludes overdiagnosed cancers, while Scenarios 2 and 3 include 
them. Scenarios | and 2 present a favorable change: two cancers 
that, in the absence of cancer screening, would have been diag- 
nosed at a distant stage are, in the presence of cancer screening, 
diagnosed at a local stage. In Scenario 3, the two distant-stage 
cancers remain as such even in the presence of cancer screening. 

The numbers of local-stage cancers increase and the numbers 
of distant-stage cancers decrease in Scenarios 1 and 2. The inclu- 
sion of the overdiagnosed cancers in Scenario 2 presents a more 
favorable picture than in Scenario 1, but it is an overly-optimistic 
picture, as the overdiagnosed cases cannot contribute to a cause- 
specific mortality reduction, should one exist. In Scenario 3, the 
distant-stage cancers are detected at the same stage, regardless of 
cancer screening. Screening cannot reduce cause-specific mortal- 
ity as no stage shift occurred; rather, it has led to the unnecessary 
detection of the two overdiagnosed cancers. Note that in Scenario 
3 the stage-specific numbers do not suggest down staging, but the 
percentages, when examined alone, do. 


5.1.5 Case Survival 


Measures of case survival will increase when cancer screening is 
introduced. Cancer screening leads to increased case survival 
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because, for screen-detected cancers, the date of diagnosis occurs 
earlier (by the amount of lead time) than it would have in the 
absence of cancer screening. Yet our ability to interpret changes in 
case survival in the presence of cancer screening, relative to case 
survival in the absence of cancer screening, is impaired because 
we do not know what the date of diagnosis or date of death would 
have been in the absence of cancer screening for a given individ- 
ual. The fictional Y and Z cancers in Fig. 8, in conjunction with 
additional fictional experience in Table 5.2, will be used to dem- 
onstrate how case survival could change with cancer screening. 
Mean and median case survival are presented for ease of explana- 
tion, although relative case survival will change as well. 

If case survival does not increase after cancer screening’s intro- 
duction, cancer screening will not lead to a reduction in cause- 
specific mortality. A lack of increase indicates that diagnoses are 
not occurring earlier and that lives are not being lengthened. It is 
virtually impossible, however, for case survival not to increase 
when cancer screening occurs, because shifting the date of diagno- 
sis to an earlier point in time is at the core of cancer screening. An 
increase in case survival does not guarantee a cause-specific mor- 
tality reduction, however. Lead time is usually responsible in that 
instance, but length-weighted sampling and overdiagnosis can lead 
to detection of cancers that will have the longest case survival 
because they have the most favorable prognosis. 

Case survival seems to be the most frequently misinterpreted 
intermediate outcome. Increases in 5-year case survival are quoted 
as evidence that cancer screening saves lives, but lead time is 
rarely mentioned as a contributing factor and possible explanation 
for the observation. 


5.1.6 Case Survival Example 
Table 5.2 presents date of diagnosis, date of death, and case sur- 


vival for the Y and Z cancers in the presence and absence of can- 
cer screening. The experience of each Y cancer represents a 
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Table 5.1 Stage distributions in the absence and presence of cancer 


screening 


Disease stage 


In the absence of 
cancer screening 


In the presence of 
cancer screening 


Cancers 


N (%) 


Cancers 


N(%) 


Scenario 1:X and Y cancers 
only (non-overdiagnosed); 
two distant cancers are now 
detected at a local stage 


Local 


2 (34%) 


X, Y, Y, Y 


4 (67%) 


Regional 


1 (17%) 


1 (17%) 


Distant 


3 (50%) 


1 (17%) 


Scenario 2:X, Y, and Z 
cancers (overdiagnosed and 
non-overdiagnosed cancer); 
two distant cancers are now 
detected at a local stage. 


Local 


Regional 
Distant 


2 (34%) 


1 (17%) 
3 (50%) 


6 (75%) 


1 (13%) 
1 (13%) 


Scenario 3:X, Y, and Z 
cancers (overdiagnosed and 
non-overdiagnosed cancer); 
two distant cancers are 
detected at the same stage 
as in the absence of cancer 
screening. 


Local 


X,Y 


2 (34%) 


X,Y,Z, Z 


4 (50%) 


Regional 


X 


1 (17%) 


X 


1 (13%) 


Distant 


XYY 


3 (50%) 


XYY 


3 (38%) 


X, Y, and Z cancers are defined in Fig. 5.1. Data are fictional 


different way that lead time can change case survival. Y1 is screen 
detected but the date of death does not change. Case survival, 
which increases from 12 months to 20 months, suggests a benefit 
though. Y2 is screen detected but dies 3 months earlier than he or 
she would have in the absence of cancer screening, perhaps due to 
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Table 5.2 Case survival in the absence and presence of cancer screening 


In the absence of cancer In the presence of cancer 
screening screening 
Date of Date of | Case Date of |Date of | Case 


Cancer | diagnosis | death | survival _| diagnosis | death | survival _ 
Yl 2/1/18 enn 12 months | 6/1/17 2/1/19 |20 months 


Y2 2/1/18 12/1/18 | 10 months | 6/1/17 9/1/18 | 15 months 

Y3 2/1/18 | 10/1/20 | 32 months | 6/1/17 12/1/22 | 66 months 

Zi Never 6/1/21 | Not | 6/1/17 6/1/21 | 48 months 
diagnosed relevant 

Z2 Never 9/1/21 | Not | 6/1/17 10/1/20 | 36 months 
diagnosed relevant 


X, Y, and Z cancers are defined in Fig. 5.1. Data are fictional 


toxicity of cancer treatment. Case survival increases, though, 
from 10 months to 15 months because of lead time. Y3 benefits 
from screen detection. Case survival increases from 32 to 
66 months, though the extension of life is only 26 months. The 
remainder of the 34-month increase in case survival, 8 months, is 
lead time. 

The Z cancers do not have a measure of case survival in the 
absence of cancer screening because they were overdiagnosed. 
Detection of an overdiagnosed cancer cannot result in extension 
of life due to treatment. It can result, however, in premature death. 
Early death can occur in the instance of an adverse event related 
to cancer screening, diagnostic evaluation, or treatment. In addi- 
tion, cancer patients have been shown to be at elevated risk of 
suicide [1]. 

Z1 is diagnosed due to cancer screening but his or her date of 
death does not change. Z2, on the other hand, dies sooner than he 
or she would have in the absence of cancer screening. Such a situ- 
ation needs to be considered when weighing benefits and harms of 
cancer screening. It also is possible that the experience of having 
cancer will lead to lifestyle changes that improve overall health 
and extend life. Both situations would be reflected in mortality 
rates. The impact of lifestyle changes that do not affect length of 
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life yet lead to improved quality of life is not usually considered 
when evaluating cancer screening efficacy or effectiveness. 

In the absence of cancer screening, the three non-overdiagnosed 
cancers would have a median case survival of 12 months and 
mean case survival of 18 months. In the presence of cancer screen- 
ing, the 5 detected cancers would have median case survival of 
36 months and mean case survival of 37 months. Yet only 1 of 5 
screen-detected cases lived longer that he or she would have in the 
absence of cancer screening. 


5.2 Cancer Screening’s Impact on Definitive 
Outcomes 


The two definitive outcomes in cancer screening are cause- 
specific mortality and all-cause mortality. The mortality outcomes 
are called definitive because it is impossible for them to be biased 
by the three screening phenomena, as is discussed in the next sec- 
tion of this chapter. That does not mean, however, that they cannot 
be affected by other factors, something that may not have been 
appreciated when the term definitive was bestowed upon them 
many years ago. 


5.2.1 Mortality Rates and the Three Screening 
Phenomena 


Cause-specific and all-cause mortality rates are not affected by 
lead time, length-weighted sampling, or overdiagnosis. They are 
not affected by lead time because date of diagnosis is not used to 
calculate mortality rates. They are not affected by length-weighted 
sampling or overdiagnosis because deaths are not restricted to 
those individuals whose cancer was screen detected. 

Recall from Chap. 4 that the numerator in cause-specific mor- 
tality rates includes all deaths due to the cause of interest and the 
numerator in all-cause mortality rates includes all deaths. The 
denominator includes all persons at risk of death, not only those 
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who were screened. It is for these reasons that mortality rates 
reflect the impact of cancer screening on the entire population 
eligible to be screened. They incorporate cancer screening’s 
successes as well as its failures, should either or both exist. 
Successes are extension of life among those screened. Failures are 
missed opportunities for early detection due to many factors, 
including limitations of the test, shortcomings in test interpreta- 
tion, and non-adherence to cancer screening or diagnostic evalua- 
tion for a positive test. 


5.2.2 Cause-Specific Mortality Rates 


The calculation of a cause-specific mortality rate to assess cancer 
screening is straightforward, as was demonstrated in Chap. 4. The 
numerator includes all individuals who died of the cause of inter- 
est. The underlying assumption in the calculation is that the 
numerator correctly captures all relevant deaths. Unfortunately, 
errors in cause of death assignment are known to occur [2]. The 
cause of death recorded on the death certificate may not be the 
true cause of death. 

When attempting to assess whether cancer screening can 
reduce cause-specific mortality, it is advised to classify any death 
that occurred as an adverse effect of the cancer screening process 
as a cause-specific death. The reason for that is to measure all 
screening failures. Any death that occurs due to the cancer for 
which screening is occurring is clearly a failure of the cancer 
screening process. However, any death due to an adverse effect of 
the cancer screening process also should be considered a failure 
because it would not have happened (or might have happened 
later) if cancer screening had not occurred. Identifying those 
deaths is a challenge because the death certificate is unlikely to 
indicate the sort of information that is necessary to link the death 
to the cancer screening process. 

The next section addresses two phenomena that affect the abil- 
ity of cause-specific mortality rates to measure what we want 
them to measure. 


5.2. Cancer Screening’s Impact on Definitive Outcomes 61 


5.2.3 Sticking Diagnosis, Slippery Linkage, 
and Assessment of Cancer Screening 


Sticking diagnosis occurs when the cancer of interest is erroneously 
assigned to be the cause of death, which can happen due to cancer’s 
reputation for lethality. Sticking diagnosis can happen in the 
instance of screen- or symptom-detected cancer, but because inci- 
dence rates typically increase with cancer screening, sticking diag- 
nosis generally leads to cause-specific mortality rates that are higher 
than they should be. In that instance, cancer screening could appear 
to not reduce cause-specific mortality when it actually does. 

Slippery linkage occurs when death certificates do not capture 
a direct or downstream consequence of cancer screening, or do 
not capture it in such a way that it can be linked to cancer screen- 
ing. Slippery linkage leads to cause-specific mortality rates that 
are lower than they should be and could lead to the conclusion that 
cancer screening does reduce cause-specific mortality when it 
actually does not. Slippery linkage would be at work in the 
instance of death due to a bowel perforation sustained during a 
screening colonoscopy, or development of fatal breast cancer 
caused by radiation from extensive imaging for an abnormality 
observed on lung cancer screening. In the former example, screen- 
ing played a part in the death, and while the death certificate is 
likely to note a medical misadventure, it probably will not reflect 
the reason for the colonoscopy. In the latter example, it would be 
all but impossible to recognize the death as a downstream effect of 
cancer screening. 

The section of the US standard death certificate that covers 
cause of death is presented as Fig. 5.2. Note that immediate 
causes, underlying causes, and significant medical conditions can 
be listed on the death certificate. Oftentimes a single underlying 
cause of death is derived using all entries according to rules set 
forth by the National Center for Health Statistics (NCHS); that 
cause of death is defined by the World Health Organization as “the 
disease or injury which initiated the train of morbid events leading 
directly to death, or the circumstances of the accident or violence 
which produced the fatal injury” [3]. 
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In the US, researchers can obtain certain death certificate fields 
as long as the scientific rationale is strong. Requestors often for- 
get, however, that death certificates are not completed with bio- 
medical research in mind. To use death certificate data for research 
purposes requires an understanding of the rules used to complete 
them and recognition of their limitations. Additional information 
about death certificate completion and cause of death coding can 
be found at the website of the National Center for Health Statistics’ 
(NCHS) National Vital Statistics System website [3]. 


5.2.4 Cause of Death Review 


To arrive at accurate cause of death information, it may be neces- 
sary to review medical records that document the events leading 
to death. A review of every death could be done, though a 
thoughtfully-chosen, algorithm-driven, subset of deaths, as was 
done in the National Lung Screening Trial (NLST) [4], will save 
time and effort. Death review is usually a large undertaking, given 
the medical records that must be obtained and the person-power 
to review them. Nevertheless, death review can help to reverse 
death certificate cause of death assignment errors caused by stick- 
ing diagnosis and slippery linkage. 


5.2.5 Cause-Specific Mortality Rates: Definitive 
Enough? 


Given the possibility of sticking diagnosis and slippery linkage, it 
is fair to question whether cause-specific mortality outcomes are 
definitive. Obviously no outcome will be perfect, and by review- 
ing medical records one may be able to circumvent much of the 
error that is possible with assigned cause of death. The “‘definitive- 
ness” for a given cancer is primarily dependent on the extent of 
sticking diagnosis and slippery linkage that goes uncorrected. 
Cause of death in the NLST was expected to be affected by 
slippery linkage and sticking diagnosis given comorbidities that 
are often experienced by heavy smokers and the perceived 
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lethality of lung cancer. However, a comparison of death certifi- 
cate cause of death and death review cause of death indicated that 
disagreement was minimal [5]. The authors concluded death 
review may not be necessary in lung cancer screening. 

It is possible to create a scenario, however far-fetched, in 
which a cause-specific mortality reduction could be explained by 
something other than cause of death errors created by slippery 
linkage. A reduction in mortality could be due, for example, to 
rapid elimination of a powerful risk factor or rapid introduction 
of a highly effective treatment. Such dramatic changes would 
have to be timed just so and be highly correlated with the act of 
being screened for the cancer of interest to explain away what 
appears to be a beneficial effect of population-based cancer 
screening. Given that the cancer landscape has never been a fast- 
changing one, that scenario is unlikely. Even cigarette smoking, 
an exceptionally strong cancer risk factor, took years to make its 
effect known, and universal smoking cessation, should it ever 
occur, also would take years for its impact to be realized. Certain 
molecularly targeted cancer therapies appear to be miracle cures, 
but they are available for only a few tumor types. The impact of 
concurrent changes on assessment of cancer screening tests is 
discussed further in Chap. 9. 

Definitive outcomes are considered by most to be superior to 
intermediate outcomes when assessing the ability of cancer 
screening to reduce cause-specific mortality. 


5.2.6 All-Cause Mortality 


All-cause mortality rates are not affected by sticking diagnosis 
and slippery linkage because no cause of death is necessary to 
calculate them. Yet all-cause mortality is not a practical outcome 
in assessment of most cancer screening tests. Reduction in cause- 
specific mortality of a typical magnitude (perhaps about 20%) 
will lead to a small relative reduction in all-cause mortality, 
because death due to a single cancer usually represents a small 
percentage of all deaths. 
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The NLST was an exception: a statistically significant 20% 
lung cancer mortality reduction was accompanied by a statisti- 
cally significant 7% all-cause mortality reduction. However, lung 
cancer deaths accounted for about 25% of all deaths, and when 
those deaths were excluded, the reduction in all-cause mortality 
was no longer statistically significant. Results from the Prostate, 
Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) are 
more in line with what typically happens. The trial observed a 
statistically significant 26% reduction in colorectal cancer mortal- 
ity, though the colorectal cancer deaths represented only about 
2% of all deaths. An insignificant reduction in all-cause mortality 
of about 2% was observed, even with accumulation of over 
800,000 person years in each arm. 

Randomized controlled trials of cancer screening that utilize 
an all-cause mortality outcome would require extremely large 
numbers of individuals to have the necessary statistical power to 
detect typical mortality reductions. Large simple trials with an 
all-cause mortality outcome have been proposed [6, 7] but have 
their own shortcomings. Large numbers of screening centers 
might allow for recruitment of hundreds of thousands of partici- 
pants, but would require more autonomy on the part of those 
screening centers, leading to challenges regarding rigor, such as 
uniform application of the screening protocol. A diffuse trial 
structure would make tracking factors that impact the outcome, 
such as contamination, difficult. 
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Experimental Research 6 
Designs 


The first five chapters of this primer present important concepts in 
cancer screening and evaluation of its data. Examples were pro- 
vided to reinforce concepts and interpretation, but most were lim- 
ited, fictional, and not intended to demonstrate how cancer 
screening efficacy and effectiveness are formally evaluated. 
Chapters 6 and 7 present the research study designs that are used 
to generate the data necessary for cancer screening assessment. 
Design features, analysis features, and strengths and weaknesses 
will be presented for each. A synopsis of at least one published 
report, along with its reference, will be provided for each design. 
Statistical theory will not be discussed. 

There are two classes of study designs: experimental and 
observational. Randomized controlled trials (RCTs) are experi- 
mental study designs and are discussed in this chapter. All other 
study designs presented in this primer are observational. They are 
discussed in Chap. 7. In general, efficacy is assessed using RCTs, 
while effectiveness is assessed using observational designs, 
though exceptions exist. Recall from Chap. 1 that efficacy refers 
to the ability of cancer screening to reduce cause-specific mortal- 
ity in a highly controlled and near ideal setting, and effectiveness 
refers to the ability of cancer screening to reduce cause-specific 
mortality in a traditional community health care setting, one that 
provides numerous and varied services and faces typical US 
health care challenges. Pragmatic RCTs, which will be discussed, 
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are conducted in community settings. They usually are classified 
as effectiveness research but are presented in this chapter given 
their experimental nature. 

Readers who would like to learn more about experimental 
research can consult Fundamentals of Clinical Trials, by Friedman, 
Furburg, and DeMets [1]. 


6.1 An Overview of Experimental Study 
Designs 


RCTs are experimental because the intervention is assigned at 
random rather than chosen by the study participant or study 
researcher. Randomization can occur individually for each par- 
ticipant (individual-level randomization) or for entities (cluster- 
level randomization). Most RCTs are composed of two groups, 
referred to as trial arms. When the number of participants is large 
enough, randomization will create, with high probability, trial 
arms that are equivalent prior to administration of the interven- 
tion. Equivalent means that the distribution of all risk and protec- 
tive factors, both measured and unmeasured, is the same in each 
trial arm. Large enough means that the trial has adequate statisti- 
cal power, which can be determined by published formulas [2]. 
The arm that does not receive the intervention is treated as the 
counterfactual experience of the intervention arm, which is the 
hypothetical experience that the intervention arm would have had 
if the intervention had not been administered. It is the counterfac- 
tual principle that allows the outcome to be fully and solely attrib- 
utable to the intervention, as randomization greatly minimizes the 
possibility of confounding. In the context of cancer screening, 
confounding occurs when a third factor is related to both screen- 
ing activity and cause-specific mortality, and will be discussed in 
detail in Chap. 7. 

All RCTs are prospective in nature. Individual-level and 
cluster-level RCTs share many features. Those features will be 
discussed in the context of individual-level trials. The manners in 
which cluster-level RCTs differ will be presented afterwards. 
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Pragmatic RCTs, a type of experimental design used in 
patient-centered research, will be discussed at the end of the chap- 
ter. Pragmatic RCTs incorporate randomization but allow for 
crossover (that is, assignment to the other trial arm) if the random- 
ization assignment is counter to patient preference. 


6.2 Individual-Level Randomized Controlled 
Trials of Screening 


6.2.1 Design Features 


Individual-level cancer screening RCTs involve randomization of 
each participant to a trial arm. RCTs have at least one intervention 
arm and one control arm. For simplicity’s sake, a trial with one 
intervention arm and one control arm will be used to present this 
chapter’s material. 

Intervention arm participants are offered the screening test or 
screening regimen that is hypothesized to be of benefit. Control 
arm participants are offered either no cancer screening test or can- 
cer screening with the standard of care screening test or regimen. 
Control arm participants who are offered no cancer screening may 
be offered an unrelated exam, such as a glaucoma exam, to engen- 
der good will and to facilitate follow up for trial outcomes. 

Ascertainment of all information, but most importantly inter- 
mediate and definitive outcomes, must be conducted with the 
same amount of rigor for each arm. Death review should be con- 
sidered. Death reviewers should be blinded to trial arm. 

An RCT is designed to have a pre-specified number of screen- 
ing rounds and years of follow-up. Screening rounds in an RCT 
are typically called TO, T1, and so on. TO refers to the first screen 
and also may be called the prevalence screen, with later screens 
called incidence screens. A stop-screen RCT is one in which fol- 
low-up continues after screening stops. All RCTs should have 
interim analysis and data monitoring plans so that a trial can be 
stopped early if evidence is overwhelming that the intervention is 
efficacious or it is not. 
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6.2.2 Analysis Features 


The primary outcome in a cancer screening individual-level RCT 
is a cause-specific mortality rate ratio (and its 95% confidence 
interval), which is the ratio of the cause-specific mortality rate in 
the intervention arm to the cause-specific mortality rate in the 
control arm. Rate ratios that are statistically significant and lower 
than 1 indicate that the intervention reduced cause-specific mor- 
tality relative to whatever was received (if anything) by the con- 
trol arm. A rate ratio that is not significantly different from 1 
indicates that there is no evidence to suggest that the intervention 
reduces cause-specific mortality, relative to whatever was received 
(if anything) by the control arm. An all-cause mortality rate ratio 
usually will be reported as well, although as discussed in Chap. 5, 
cancer screening RCTs rarely have the statistical power to detect 
a significant reduction in all-cause mortality because death due to 
the cancer of interest usually represents a small percentage of all 
deaths. Intermediate outcomes often are reported as well. 

If it is desired to generate an adjusted ratio due to suspected 
confounding, proportional hazards models can be used. 
Confounding is unlikely in well-designed and well-executed 
RCTs, but it is often worthwhile to explore the possibility. If con- 
founding by measured factors is not present, the unadjusted and 
adjusted ratios will be similar. Proportional hazards models do not 
produce rate ratios; instead, they produce hazard ratios, which 
reflect the instantaneous risk of death. Hazard ratios are compa- 
rable to mortality rate ratios as the two types of ratios produce the 
same information: a relative measure of the chance of death in the 
intervention arm versus the chance of death in the control arm. 

From the counterfactual principle comes the expectation that, 
prior to application of the intervention, the same number of can- 
cers and cancer deaths would emerge in the two trial arms as time 
passes. Thanks to randomization, the intervention arm partici- 
pants have counterparts in the control arm who would have the 
same experience, including cancer diagnosis and death, if screen- 
ing did not occur. The intervention arm will quickly begin to 
accrue more cancer cases than the control arm once screening 
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begins, primarily because of lead time. In the absence of overdi- 
agnosis, the number of cancers is expected to equalize at some 
point after screening stops, a phenomenon called catch-up. In the 
presence of overdiagnosis, catch-up does not occur, because 
screening found cancers whose control arm counterparts do not 
present in the absence of screening. A stop-screen design allows 
the question of overdiagnosis to be addressed by comparing the 
numbers of cancers in the two arms at a point in time after screen- 
ing ceases. The appropriate point in time is based on beliefs about 
the natural history of disease. A stabilization of the difference in 
the number of cancers as time progresses is a good indication that 
catch-up is complete. That stable difference is the magnitude of 
overdiagnosis. This method for calculating overdiagnosis is called 
the excess incidence method. Another method for estimating 
overdiagnosis is discussed in Chap. 9. Assessing overdiagnosis in 
an RCT that does not utilize a stop-screen design cannot be done 
unless the length of the trial is longer than the longest of lead 
times. With a long enough observation period, the difference in 
cancer incidence between the trial arms will stabilize; the differ- 
ence at that point is the magnitude of overdiagnosis. 

Cessation of screening can lead to dilution of the mortality rate 
ratio. Dilution occurs when a mortality rate ratio that suggested a 
benefit of screening moves closer to a null result (no benefit; a rate 
ratio of 1) as time passes without screening. The counterfactual 
principle explains why dilution occurs: after screening ends, the 
trial arms eventually will return to their pre-intervention states, a 
time when they were equivalent in terms of their mortality rates. 
Any beneficial effect of cancer screening will eventually cease. 
An RCT that does not utilize a stop-screen design will not experi- 
ence dilution. 

Most RCTs randomize in a 1-to-1 fashion, leading to equal 
sample sizes in the two arms. Discussion of overdiagnosis and 
catch-up assumed equal numbers were randomized to each arm. If 
other randomization schemes are used, expectations regarding 
catch-up must be adjusted. For example, a trial that employs a 
stop-screen design and randomizes in a 2 (intervention) to 1 (con- 
trol) fashion is expected to have twice as many cases in the inter- 
vention arm, if overdiagnosis does not exist. 
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6.2.3 Strengths and Weaknesses 


The greatest strength of a cancer screening RCT is that results can 
be attributed to the intervention and not to a confounding factor, 
but only if randomization achieved its goal of creating two equiv- 
alent groups. The chance of that happening is positively corre- 
lated with the size of the trial arms. Screening trials that have the 
necessary statistical power to properly assess a cause-specific 
mortality rate ratio are almost guaranteed to have equivalent 
groups as long as nothing in the randomization process is system- 
atically awry. 

Other potential differences in the experience of the arms must 
be considered when interpreting the findings of a cancer screening 
RCT. Outcome ascertainment methods need to be equivalent for 
the two arms, as does treatment for a given stage of cancer. Most 
RCTs collect extensive amounts of data; therefore, the aforemen- 
tioned two conditions often can be assessed. However, it is impor- 
tant to remember that participants in the intervention arm will 
have more contact with trial staff during the screening period of 
the trial, which could lead to the two arms having different experi- 
ences at many points in the screening process. 

Standardized application of the screening regimen is a strength. 
An RCT is thought to provide the most favorable setting in which 
to evaluate a screening regimen; all steps in the screening process, 
from invitation to treatment, tend to occur with an extra level of 
forethought and rigor. 

Cancer screening RCTs are expensive and take a long time to 
complete. They require large numbers of participants for reasons 
of statistical power. If intervention arm participants do not receive 
the intervention of interest (referred to as non-compliance) or 
control arm participants do (referred to as contamination), statisti- 
cal power may be compromised if the degree of observed non- 
compliance and contamination is greater than what was assumed 
when the trial was designed. In the instance of extreme non- 
compliance and contamination, the trial arms become indistin- 
guishable and any comparison in mortality rates is meaningless. If 
the intervention is available outside the trial and either is 
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inexpensive or covered by health insurance, high rates of con- 
tamination are likely and may make an RCT impractical. 


6.2.4 Example of an Individual-Level Cancer 
Screening RCT 


There have been a number of cancer screening RCTs conducted, 
and they vary with regard to rigor and availability of information 
on their conduct. A well-conducted and a well-documented can- 
cer screening RCT is the Prostate, Lung, Colorectal and Ovarian 
Cancer Screening Trial (PLCO), which has been mentioned previ- 
ously. Informative publications include the primary outcome 
papers [3—6] and methods and operations papers [2, 7]. The meth- 
ods and operations papers will be useful to those who are plan- 
ning to launch a trial or wish to learn more about the nuts and 
bolts of how cancer screening RCTs are carried out. 


6.3 Cluster-Level Randomized Controlled 
Trials of Cancer Screening 


6.3.1 Design Features 


A cancer screening cluster-level RCT is quite similar to an 
individual-level RCT. The only design difference is that cluster- 
level trials randomize groups rather than individuals. The number 
of groups must be at least two but can be more. If a group is ran- 
domized to receive the intervention, all eligible individuals in that 
group are invited to receive it. Groups often are geopolitical enti- 
ties, such as counties or provinces. The groups to be randomized 
must be similar enough for the counterfactual principle to hold. 


6.3.2 Analysis Features 


The same principles that hold for analysis of individual-level can- 
cer screening RCTs hold for cluster-level cancer screening RCTs, 
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except in one instance. A cluster-level RCT usually is analyzed at 
the cluster level, meaning that the cluster, rather than individual, 
is the unit of analysis [8]. When analyzed at the cluster level, sta- 
tistical analyses are straightforward, but results are applicable to 
only clusters. For example, a cause-specific mortality rate ratio of 
0.80 indicates that clusters that were offered the intervention have 
a 20% reduction in cause-specific mortality rates relative to those 
clusters that were not, not that individuals who were screened had 
a 20% reduction in cause-specific mortality. The conclusions are 
not guaranteed to be directly applicable to the individuals who 
reside in those clusters, although many times they are interpreted 
as if they are. 

It is inappropriate to analyze a cluster-level RCT as one would 
analyze an individual-level RCT; that is, it is inappropriate to use 
individuals as the unit of analysis rather than the cluster. 
Individuals within a cluster are rarely independent of one another. 
Lack of independence invalidates statistical assumptions on which 
methods rest and can lead to incorrect conclusions. There are, 
however, advanced statistical methods that can account for the 
lack of independence that accompanies individuals within clus- 
ters and allow for inferences to individuals [9]. 


6.3.3 Strengths and Weaknesses 


A cluster-level RCT of cancer screening can have very low rates 
of contamination if the new screening regimen is available in only 
certain clusters and it is difficult for individuals to cross into or 
receive medical services in other clusters. In addition, cancer 
mortality rates are often available for clusters that are geopolitical 
entities, eliminating the need for collection of mortality informa- 
tion as part of the RCT. However, compliance within a cluster can 
be low because individuals are usually not consulted before ran- 
domization. The number of clusters is often small, which can 
impact the ability of randomization to produce a true counterfac- 
tual group. 

Cluster-level RCTs of cancer screening are difficult to carry 
out in places with opportunistic screening. In the US, 
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randomization by state could be attempted, but ease of mobility 
and out-of-network health insurance policy benefits, not to men- 
tion entrepreneurial ventures, could foster contamination. 
Cluster-level RCTs of cancer screening may be more easily 
done in countries with government-administered health care, 
although a Swedish cluster-level RCT of mammography screen- 
ing still experienced non-negligible rates of contamination in 
the control arm [10]. 


6.3.4 Example of a Cluster-Level Cancer 
Screening RCT 


In the United Kingdom (UK), the AgeX cluster-level RCT is look- 
ing at the impact of offering an additional breast cancer screen to 
women ages 47—49 and offering breast cancer screening every 
3 years to women over 70 [11]. Most of the 80 breast cancer 
screening centers in the UK’s National Health Service are partici- 
pating. Each center is a cluster and is randomized to the interven- 
tion arm or the control arm. All women in intervention arm 
clusters are invited to receive the age-appropriate additional 
screens. All women in control arm clusters are invited to receive 
the standard breast cancer screening regimen. 


6.4 Pragmatic Randomized Controlled Trials 
of Cancer Screening 


RCTs of cancer screening usually have been carried out in highly 
controlled and near ideal settings. They have measured efficacy 
rather than effectiveness. Effectiveness can be addressed by prag- 
matic RCTs. 

A pragmatic RCT is done in the reality of every day health 
care, which introduces many challenges that can hinder the ability 
of a cancer screening test to reduce mortality. Pragmatic trials 
usually have fewer eligibility criteria than in traditional RCTs. 
Pragmatic RCTs typically do not hire staff dedicated to trial oper- 
ations; in other words, there usually are no extra resources for 


76 6 Experimental Research Designs 


recruitment or compliance. Data collection above and beyond 
what is collected in usual care is not common. 

Though randomization still occurs in pragmatic trials, patients 
may have the opportunity to receive what they want rather than 
what randomization assigns to them. While that may seem hereti- 
cal to a strict clinical trialist, the goal of a pragmatic trial is to 
evaluate the impact of introducing a cancer screening test in a 
community health care setting. The impact reflects the fact that 
some patients will accept the test and some will not. 

To learn more about pragmatic trials and patient-centered 
research in general, consult the National Institutes of Health 
(NIH) Collaboratory’s Living Textbook of Pragmatic Clinical 
Trials [12], a website that presents expert consensus regarding 
special considerations, standard approaches, and best practices in 
the design, conduct, and reporting of pragmatic clinical trials. 


6.4.1 Examples of Pragmatic Cancer 
Screening RCTs 


There are no completed pragmatic RCTs of cancer screening 
effectiveness, although there are at least two underway. The 
HOME trial, conducted in the Kaiser Washington managed care 
system, is examining the ability of self-sampling to increase cer- 
vical cancer screening uptake and effectiveness [13]. Self- 
sampling could overcome certain real-world barriers to being 
screened, including lack of transportation to a clinic, lack of child 
care, and needing time off from work. It also could increase cervi- 
cal cancer screening uptake among women who prefer not to 
receive a pelvic exam. The WISDOM trial, conducted in clinics in 
California and South Dakota, is comparing breast cancer screen- 
ing regimens based on age to screening regimens based on risk 
[14]. WISDOM is using what is known as a preference tolerant 
design, which encourages randomization but allows women to 
self-assign if they wish. The reason for choosing such a design 
was to maximize participation, a factor that may lead to better 
generalizability of results. 
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Observational Research 7 
Designs 


7.1 An Overview of Observational Study 
Designs 


Observational studies do not dictate the cancer screening regi- 
mens that their study subjects utilize. Instead, these studies col- 
lect data on individuals’ cancer screening practices, cancer 
outcomes, and other factors if needed. Because no regimens are 
dictated, an observational study can capture information about 
and evaluate a variety of cancer screening practices, including 
use of different tests or cancer screening regimens. Observational 
studies can be retrospective or prospective in nature, with the 
distinction dependent on how and when individuals are chosen 
for study inclusion. Prospective observational studies of cancer 
screening track individuals as they move forward in time until 
the event of interest happens or the study is complete. 
Retrospective observational studies of cancer screening look at 
past experiences of individuals who have had the event of inter- 
est and others who have not. Prospective observational studies 
are said to sample based on exposure (cancer screening experi- 
ence), while retrospective observational studies are said to sam- 
ple based on outcome (death). 

Observational studies provide weaker evidence than experi- 
mental studies because observational studies are subject to con- 
founding. Confounding occurs when a third factor is associated 
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with both the cancer screening practice and cause-specific mortal- 
ity, meaning that the third factor is not equally present among 
groups of individuals with different cancer screening practices 
and is not equally present among groups of individuals with dif- 
ferent cancer outcomes. An example comes from observational 
studies of colorectal cancer screening and colorectal cancer 
mortality. Individuals who exercise are more likely to have 
colorectal cancer screening and also are less likely to die of 
colorectal cancer. If an observational study observes a reduction 
in colorectal cancer mortality with cancer screening, we cannot be 
sure what is responsible. Is it cancer screening, exercise, a combi- 
nation of both, or some unknown protective factor that is more 
likely among individuals who receive cancer screening and who 
exercise? 

Confounding is a type of bias and leads to an incorrect estimate 
of the true relationship of cancer screening and cause-specific 
mortality. It is present in varying degrees in all observational stud- 
ies and while it can be dampened using statistical methods, these 
methods cannot eliminate all confounding because it is not pos- 
sible to measure all confounders or measure them accurately. 

Observational studies usually are less expensive and easier 
to perform than experimental studies. There are many reasons 
for that: the study usually does not administer or pay for the 
cancer screening test; existing databases often are used; and 
retrospective studies do not need to wait for time to pass since 
the data already have been collected. Some prospective studies 
can take as long or longer than a randomized controlled trial 
(RCT), however. Retrospective studies are often used as a first 
pass to examine a hypothesis about a cancer screening test, 
especially if use of that test is prematurely disseminating into 
community practice. 

Observational study designs that are frequently used in cancer 
screening assessment will be discussed: cohort, case-control, and 
ecologic. Single-arm studies, sometimes known as case series, 
will be presented as well. Readers who wish to learn more about 
observational research can consult Modern Epidemiology (3rd 
edition) by Rothman, Greenland, and Lash [1]. 
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7.2 Cohort Studies 
7.2.1 Design Features 


A cohort is a group of people with something in common, either 
by nature or design, who are followed through time for an event 
of interest. Research cohorts can be created in one of two ways. 
Prospective cohorts are created in real-time; data is collected as 
time passes. Retrospective cohorts, also known as historic cohorts, 
are created after data have been collected. These cohorts comprise 
extracted data from pre-existing data sources, such as Medicare or 
the medical records of health maintenance organization members. 
Retrospective cohorts usually are analyzed as if their data had 
been collected prospectively and generally are constructed for the 
purpose of addressing pre-determined research questions. In pro- 
spective cohort studies, individuals usually are recruited to 
actively participate in the study, but with retrospective cohort 
studies, individuals usually do not know that their data are being 
used to answer a specific research question. 

Information on cohort experience can come from a variety of 
data sources. Prospective cohorts usually rely heavily on partici- 
pant interviews and participant-completed questionnaires, and 
may use medical records to validate procedures and diagnoses. 
Retrospective cohorts typically have little-to-no additional infor- 
mation collected on them. In both instances, deaths can be con- 
firmed with collection of death certificates, while death certificate 
cause of death can be verified with review of medical records that 
document clinical experiences prior to death. It is recommended 
that records for at least the 3-6 months prior to the date of death 
be considered [2]. 

No prospective cohorts have been established for the primary 
purpose of examining cancer screening effectiveness, though 
some have been established to collect other information about the 
cancer screening process in community settings. Some pre- 
existing prospective cohorts have been used to address effective- 
ness if cancer screening and death information are available. 
Many retrospective cohorts have been created to address a range 
of questions regarding cancer screening. Cohorts without 
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information on cancer screening can be repurposed by collecting 
the needed information if it is available. 


7.2.2 Analysis Features 


Cohort members choose their cancer screening regimens, which 
means that confounding is all but guaranteed. Therefore, outcome 
measures need to be calculated using statistical models that allow 
for adjustment for confounding variables. If timing of death (date 
of death or person-years of experience) is available, Cox propor- 
tional hazards regression or Poisson regression can be used, with 
the choice determined by assumptions regarding whether the haz- 
ard of death changes over time [3]. Logistic regression can be 
used if information on timing of death is unavailable. 

Poisson regression produces a cause-specific mortality rate 
ratio, Cox proportional hazards regression produces a cause- 
specific hazard rate ratio, and logistic regression produces an odds 
ratio, which estimates a risk ratio (also known as a relative risk) in 
the case of a rare outcome like cancer. Each ratio represents a 
measure of disease burden in the individuals who received the 
cancer screening regimen of interest divided by those who did 
not. When assessing cancer screening data, the exact measure 
used (mortality rate, hazard rate, or odds) is of less importance 
than the ratio that they will produce. The three methods, when 
applied to a cancer screening cohort with typical experience, usu- 
ally will produce ratios that lead to the same conclusion about the 
benefit of cancer screening. 

Risk difference measures are sometimes used to describe how 
the absolute rather than relative magnitude of disease burden 
changes with cancer screening. To calculate a risk difference, the 
measure of interest (incidence rate, mortality rate, or hazard rate) 
in the presence of cancer screening is subtracted from the measure 
of interest in the absence of cancer screening. For example, a 
cause-specific mortality rate of 4 per 1000 person-years in the 
absence of cancer screening and a cause-specific mortality rate of 
3 per 1000 person-years in the presence of cancer screening result 
in a risk difference of 1 per 1000 person-years. Difference mea- 
sures are more useful than relative measures when considering 
health care resource allocation. 
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7.2.3 Strengths and Weaknesses 


Cohort studies allow for evaluation of effectiveness, something of 
the utmost importance because the manner in which cancer 
screening is utilized in community settings is often quite different 
from the idealized regimens in RCTs. For example, an RCT might 
test an annual regimen, but the regimen that evolves in the com- 
munity could have longer or varied cancer screening intervals, 
especially when the cancer screening test is not fully acceptable to 
community members. Cohort studies also can be used to examine 
uptake of a new cancer screening test or test performance 
measures. 

The value of a cohort often depends on the extent of confound- 
ing and timing of data collection. The chance and possible impact 
of confounding must be discussed whenever cohort data are pre- 
sented. Regarding timing, cohort data are analyzed as if data were 
collected with the passing of time, meaning that collection of 
information on exposure (cancer screening and confounding) 
occurs before the outcome (cause-specific mortality) has occurred 
or is known. The data for some cohorts are collected that way, but 
for cohorts that are repurposed, researchers often need to collect 
information on past events. These data may be affected by recall 
bias, which happens when the passage of time results in data 
errors that then lead to incorrect estimates of the true relationship 
between cancer screening and cause-specific mortality. For exam- 
ple, let’s say a cohort is used to examine the ability of colonos- 
copy to reduce colorectal cancer mortality. When participants are 
asked about cancer screening use in the past 10 years, some may 
erroneously report a past colonoscopy when in fact their exam 
was a flexible sigmoidoscopy, which also reduces colorectal can- 
cer mortality but to a lesser degree. Non-trivial error in reporting 
would lead to an observed association between colonoscopy and 
colorectal cancer that is weaker than the real association. The 
recall error led to measurement of the impact of receiving flexible 
sigmoidoscopy or colonoscopy, rather than only colonoscopy. 

Needless to say, it’s best to collect information as soon as pos- 
sible after an exposure occurs. It is probably best to collect 
information on past cancer screening activities from medical 
records or health insurance claims rather than participant 
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interviews, although medical records can be lost, and laws may be 
enacted that make use of both sources more difficult. Also, medi- 
cal records do not always provide complete or correct informa- 
tion. They are subject to human error and in some instances 
creative procedure coding to maximize insurance reimbursement. 

To have adequate statistical power, cohort studies evaluating 
cancer screening usually need to be large and have a number of 
years of follow-up. Establishment of a new cohort and the infra- 
structure to track the experience of the participants will be expen- 
sive, although typically less than that of an RCT, as cancer 
screening activity and follow-up occurs as part of community 
care. Repurposing of an existing cohort can save money and time, 
but the need for additional data will lead to a reduction in resources 
saved. 


7.2.4 Variations 


A nested case-control study of cancer screening uses all cause- 
specific deaths in a cohort as cases, but only a subset of the rest of 
cohort members as controls. This design is used when additional 
data collection is needed and is expensive or time-consuming, as 
in the situation of needing to determine the indication for a medi- 
cal test. Nested case-control studies of cancer screening are con- 
structed and analyzed in the same manner as case-control studies 
of cancer screening; the only difference is that cases and controls 
are drawn from an established cohort rather than another source. 
Details of case-control studies of cancer screening, nested and 
otherwise, are presented later on in this chapter. 


7.2.5 Examples of Cancer Screening Cohort 
Studies 


The BCSC is a prospective cohort study of breast cancer screen- 
ing. It is a cancer screening test registry: information on screening 
mammograms and other breast cancer screening imaging tests is 
collected, as well as information on the women who receive them. 
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The unit of analysis is often a test rather than a woman. The BCSC 
is not intended to evaluate cancer screening effectiveness; instead, 
it strives to “assess and improve the delivery and quality of breast 
cancer screening and related patient outcomes”. The cohort has 
been used to evaluate important issues in breast cancer screening, 
including screening adherence, test performance, and supplemen- 
tal screening [4]. 

The Nurses’ Health Study (NHS) and Health Professionals 
Follow Up Study (HPFS) are on-going prospective cohort studies, 
each designed to explore causes of major health conditions in the 
US. The NHS began in 1976 and the HPFS in 1985. Both studies 
added questions to their self-administered questionnaires in 1990 
regarding receipt of lower endoscopy (colonoscopy or sigmoidos- 
copy). The researchers published findings on the impact of lower 
endoscopy on colorectal cancer mortality in 2013 [5]. 

Kaiser Permanente of North California (KPNC) is an inte- 
grated health care delivery system with more than 4 million mem- 
bers. Their extensive electronic health databases have been used 
to address many questions in cancer etiology and prevention, 
including cancer screening. An example of how a health care 
organization’s databases can be used to conduct a retrospective 
cohort study of cancer screening can be found in KPNC’s report 
on the long-term risk of colorectal cancer death after a negative 
colonoscopy [6]. 


7.3 Case-Control Studies 
7.3.1 Design Features 


A case-control study is retrospective in nature, meaning that all 
exposures and events have occurred before the study begins. A 
case-control study includes cases, who are individuals who had 
the outcome of interest, and controls, who are individuals who did 
not have that outcome at a point in time that is determined by the 
case’s experience. The design has been used extensively in cancer 
etiology studies. A case-control study often aims to include the 
universe of cases: all individuals who experience the outcome of 
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interest during a specific time period. Controls are randomly sam- 
pled (usually within age strata) from the population that gave rise 
to the cases. In case-control studies of cancer etiology a 
population-based roster, such as a list of drivers’ license holders, 
is used to sample controls. 

In principle, case-control studies of cancer screening are the 
same as case-control studies of etiology. Cases are individuals 
who have died due to the cancer of interest. Controls are selected 
for a specific case, with random selection usually stratified on age 
and sex of the case. In addition, controls must not have been diag- 
nosed with the cancer of interest prior to the case’s diagnosis date; 
the reason is to ensure an equal and cotemporaneous opportunity 
for cancer screening. Some case-control studies of cancer screen- 
ing have required that controls be alive on the date of the case’s 
death. Cancer screening experience during a specific period (as 
discussed below) is compared in cases and controls. 

Case-control studies of cancer screening usually select their 
cases and controls from health system patient rosters because 
access to medical records is a necessity. Medical records are used 
to determine whether a test was done for cancer screening as 
opposed to diagnostic evaluation, and obtain details of cancer 
diagnoses. As was noted earlier in this chapter, case-control stud- 
ies of cancer screening also can be constructed by selecting cases 
and controls from an established cohort. 


7.3.2 Analysis Features 


Case-control studies of cancer screening are designed and ana- 
lyzed as matched case-control studies because exposure assign- 
ment for controls is defined by the experience of a case. 
Conditional logistic regression models are used to account for the 
matching and to adjust for other possible confounders. Logistic 
regression produces an odds ratio; in the instance of a case-control 
study of cancer screening, it is the ratio of the odds of receiving 
cancer screening among those who died of the cancer of interest 
divided by the odds of receiving cancer screening among those 
who did not die of the cancer of interest. 
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The primary challenge in analysis of case-control studies of 
cancer screening is assessing cancer screening exposure. An 
exposure window, one that reflects the period when cancer screen- 
ing could have been beneficial to cases (Phase B as defined in 
Chap. 2), must be defined. The exposure window for cases ends 
no later than the date of diagnosis, and usually ends prior to the 
date of diagnosis to exclude the period when cases were undergo- 
ing diagnostic evaluation. Controls are given a reference date, 
which corresponds to the final date of their matched case’s expo- 
sure window. Only cancer screening experience prior to that date 
is considered to be in the exposure window. Cancer screening 
tests that occurred in the distant past should be excluded if there is 
reason to believe that they were done prior to the time the case’s 
cancer was in Phase B. 


7.3.3 Strengths and Weaknesses 


Case-control studies of cancer screening are retrospective research 
and can be done more quickly and inexpensively than cohort stud- 
ies or RCTs. The number of cases is known at the start of the 
study, and controls are selected only if they match to a known 
case. Detailed information, such as that found in medical records, 
usually is needed to determine whether a test was for cancer 
screening and whether it occurred within the exposure window. 

Confounding is a concern in case-control studies of cancer 
screening. Recall bias may be of concern if medical record 
abstractors are aware of the study hypothesis, or if medical records 
are systematically missing information, or are systematically 
unavailable. Because the exposure window must be inferred, it 
never will correctly capture the exact period in which cancer 
screening could have been of benefit to the cases. The exposure 
window must be thoughtfully chosen, and sensitivity analyses can 
be used to explore the impact of varying its definition. 

The many methodologic challenges in design and analysis of 
case-control studies of cancer screening are discussed in Cronin 
et al. [7] and Weiss [8]. 


88 7 Observational Research Designs 


7.3.4 Example of Case-Control Studies of Cancer 
Screening 


Using data on women residing in Saskatchewan, Pocobelli and 
Weiss conducted a case-control study of breast cancer mortality in 
relation to receipt of screening mammography [9]. Saskatchewan 
has a universal health care system funded by the government, with 
nearly all residents eligible for coverage. About 90% of residents 
also are eligible for province-funded outpatient prescription drug 
benefits. The cases and controls for the cancer screening study 
were sampled from a larger study that utilized the roster of women 
with drug benefits. Cases were women who died due to breast 
cancer at 50-79 years of age during the years 1990-2008. Controls 
were selected for each case and were women who had the same 
birth year as the case and were not diagnosed with breast cancer 
prior to the case’s date of diagnosis. Additional methodologic 
considerations, including definition of the exposure window, are 
discussed in the paper. 


7.4 Ecologic Studies 
7.4.1 Design Features 


An ecologic study is the observational equivalent of a cluster- 
level RCT: the experience of groups, usually geopolitical entities, 
rather than individuals, is examined. The outcome of interest is 
cause-specific mortality rates, and the exposure is a measure of 
cancer screening utilization in the entity. Ecologic studies of can- 
cer screening often compare cause-specific mortality rates for 
countries with different degrees of cancer screening utilization. 


7.4.2 Analysis Features 
Data from ecologic studies of cancer screening often are pre- 


sented using simple two-axis plots. Cause-specific mortality rates 
are plotted on one axis and their associated cancer screening use 
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metric is plotted on the other. Percentage of eligible individuals 
screened is an example of a metric that has been used in ecologic 
studies. If cancer screening reduces cause-specific mortality, a 
graph of cause-specific mortality rates on the y-axis and the can- 
cer screening metric that measures use on the x-axis should pro- 
duce a pattern of negative correlation. Figure 7.1 presents a 
fictional ecologic study in which utilization of cancer screening 
and cause-specific mortality are negatively correlated, as sug- 
gested by a fitted line that slopes downward. We cannot assume 
that cancer screening is the reason for the decrease in cause- 
specific mortality as other factors may be at play. In the instance 
of an ecologic study that suggests a reduction in cause-specific 
mortality, changes or regional differences in cancer treatment 
need to be carefully considered as confounders. Accuracy of the 
summary measures must be considered as well. 

Ecologic studies can provide compelling evidence that cancer 
screening implementation has not led to reductions in 
cause-specific mortality for some cancer sites. In Fig. 7.2, the 
cause-specific mortality rate hovers between 3.5 and 4.0 per 
10,000 person-years regardless of cancer screening uptake, and 
the fitted line suggests no negative correlation. It is unlikely that 
such a pattern would mask a benefit of cancer screening due to 
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Fig. 7.1 Plot of data from an ecologic study that suggests a benefit of cancer 
screening. PY stands for person-years. Data are fictional 
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Fig. 7.2 Plot of data from an ecologic study that suggests no benefit of can- 
cer screening. PY stands for person-years. Data are fictional 


confounding, as the confounding factor would need to increase 
cause-specific mortality and increase cancer screening use. 
Though very high-risk individuals do tend to receive cancer 
screening more frequently, they are a small percentage of the indi- 
viduals in a population, and cannot drive entity-level rates unless 
most deaths from cancer occur in the high risk group. 


7.4.3 Strengths and Weaknesses 


Ecologic studies of cancer screening are usually easier to under- 
take and less expensive than individual-level observational stud- 
ies. Cause-specific mortality rates are publicly available for 
geographic entities in the US and elsewhere. Obtaining data on 
cancer screening use within a group is more challenging, but data 
sources already exist in the US for some cancer screening prac- 
tices. Electronic medical records or administrative claims from a 
universal provider such as Medicare could be used as well. It can, 
however, be challenging to identify a set of entities to compare. To 
determine appropriateness, it is useful to return to the counterfac- 
tual principle and choose entities that are as comparable as pos- 
sible except for cancer screening practices. 
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Ecologic studies have a number of shortcomings. There can 
be confounding at the entity level, although linear regression 
can be used to adjust for some degree of the influence of con- 
founding factors if that information is available. The results may 
not be applicable to the individuals within the entities, as was 
discussed in the context of cluster-level RCTs. Measures of can- 
cer screening utilization that are not calculated in conjunction 
with individual-level medical records often are overestimates, as 
they can reflect use of cancer screening modalities that can be 
used for diagnostic purposes. For example, a measure of colo- 
noscopy utilization derived by counting all colonoscopies per- 
formed will include both screening colonoscopies and diagnostic 
colonoscopies. 


7.4.4 Variations 


A time trend study is a type of ecologic study that examines 
changes in cancer mortality rates as time passes, with time as a 
marker for changes in cancer screening practice. Time trend stud- 
ies are useful for examining changes in cause-specific mortality 
rates after cancer screening is introduced or after cancer screening 
regimens change. A two-axis graph can be used, with cause- 
specific mortality rates on the y-axis and year on the x-axis. A 
metric of cancer screening utilization, if available, can be included 
by using a second (right-sided) y-axis. Otherwise, milestones in 
cancer screening practices, such as the year that cancer screening 
was recommended for the first time, can be annotated. Figure 7.3 
is an example, again fictional, of such a graph. Rates of cause- 
specific mortality decrease soon after widespread recommenda- 
tion of cancer screening (1993). We cannot, however, assume that 
the decrease in mortality is due to cancer screening; other 
concurrent changes, such as treatment improvements, that might 
explain the observed pattern need to be considered before draw- 
ing any conclusions. 

In this fictional example, cause-specific mortality is stable 
prior to wide-spread recommendation of cancer screening in 
1993. Cause-specific mortality begins to drop in 1994. It stabilizes 
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Fig.7.3 Time trends in cancer mortality before and after recommendation of 
cancer screening (1993). PY stands for person-years. Data are fictional 


around 2000, perhaps due to a leveling off of cancer screening 
utilization. 


7.4.5 Examples of Ecologic Studies of Cancer 
Screening 


A current controversy in breast cancer screening is whether reduc- 
tions in breast cancer mortality are due primarily to screening or 
to improvements in treatment. To examine that question, Autier 
et al. examined breast cancer mortality rates in neighboring 
European countries with different histories of screening use but 
access to similar treatments [10]. Their ecologic analysis suggests 
that cancer screening has played only a minor role in improve- 
ments in breast cancer mortality. 

The use of thyroid cancer screening in South Korea began to 
increase in 1999 when it was offered as a paid add-on test to the set 
of cancer screening tests offered for free through a national cancer 
screening program. No changes in thyroid cancer mortality were 
observed as utilization increased, and in 2013, use began to wane 
due to the evidence of no benefit and compelling evidence of overdi- 
agnosis [11]. 
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7.5 Single-Arm Studies 
7.5.1 Design Features 


In the context of cancer screening, a single-arm study refers to the 
experience, within a period of time, of a set of individuals who 
receive a screen in the context of a medical study. The screen is 
usually not standard of care. The test is considered to be experi- 
mental for cancer screening purposes, but a single-arm study is 
considered an observational study because it involves no random- 
ization. Single-arm studies are a type of cohort study in which no 
participants are unscreened. 


7.5.2 Analysis Features 


Cancer screening single-arm studies are used to assess perfor- 
mance of proposed tests, most notably, the ability of a test to lead 
to cancer detection at an early stage. These studies tend to enroll 
a small number of participants. Because there is no study com- 
parison group, results either are presented on their own or com- 
pared with those from published literature or a population-level 
database such as SEER. 


7.5.3 Strengths and Weaknesses 


Cancer screening single-arm studies are very limited in the infor- 
mation they can provide. The participants usually are a highly 
select group, suggesting that their experience is unlikely to be rep- 
resentative of what would occur in the general population. 
Participants typically are not chosen in a random fashion. They 
may be paid for their participation, or they may be required to pay 
to participate. Data collection usually does not include cause- 
specific mortality experience. Nevertheless, single-arm studies 
are a useful way to assess whether a proposed cancer screening 
test should receive further study. 
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7.5.4 Variations 


A case series (or clinical series) is similar to a single-arm study. 
The difference is that case series include the clinical experiences 
of individuals not enrolled in a study. They are a culling of patients 
who have had the same exposure, in this case, cancer screening. 
Analyses examine their post-screening experience. The terms 
single-arm study and case/clinical series have been used inter- 
changeably. 


7.5.5 Examples of Cancer Screening Single-Arm 
Studies 


The Mayo Clinic initiated a single-arm study in 1999 to evaluate 
the performance of lung cancer screening with low dose com- 
puted tomography (LDCT) [12]. At that time, there was evidence, 
though not definitive, that LDCT screening might reduce lung 
cancer mortality. The purpose of this study was to address some 
of the outstanding questions in LDCT screening, including the 
magnitude of false positive tests and the prevalence of adverse 
downstream effects. Participants were offered four annual LDCT 
screens. They were current or former cigarette smokers, with for- 
mer smokers having quit less than 10 years ago. They also had at 
least a 20 pack-year history of smoking. 


7.6 Two-in-One Single-Arm Studies 
7.6.1 Design Features 


In a two-in-one single-arm study, individuals are offered the 
chance to receive cancer screening with an experimental test in 
addition to and at the same time as the standard of care cancer 
screening test. Each participant receives both tests, and each test 
is evaluated without knowledge of the results of the other. Action 
is taken if either test result is positive. 
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Two-in-one single-arm studies have been used to determine if 
an experimental test has improved performance measures relative 
to the standard of care. They also have been used to examine 
whether an experimental screening test with a favorable feature 
(for example, lower cost, less invasiveness, or greater patient 
acceptability) has similar performance measures as the standard 
of care test. Two-in-one single-arm studies have been used to 
compare two tests already available in clinical settings. They also 
have been used to compare an experimental test with a diagnostic 
test, as diagnostic tests provide a definitive answer as to the pres- 
ence of cancer. 

A two-in-one single-arm study usually cannot be used to eval- 
uate tests beyond diagnosis, although excessively optimistic spec- 
ulation about the benefits of the experimental test is not 
uncommon. 


7.6.2 Analysis Features 


The analytic focus of a two-in-one cancer screening single-arm 
study is a comparison of the performance of the tests. Of most 
interest is how and when the two tests disagree. Tables 7.1 and 7.2 
present data from a fictional two-in-one single-arm study with 
1000 participants. Table 7.1 compares positivity rates and 
Table 7.2 compares cancer diagnoses. 

In Table 7.1, we see that both tests returned positive results for 
80 individuals. Twenty individuals, however, received a positive 
experimental test result and a negative standard of care test result. 


Table 7.1 Comparison of results from the standard of care and experimental 
screening tests 


| Standard of care test | 


| Positive result | Negative result | Total 


Experimental test | Positive result 180 |20 |100 
| Negative result | 0 | 900 | 900 
| Total 80 | 920 | 1000 


Data are fictional 
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Table 7.2 Cancer diagnoses by results of standard of care and experimental 
screening tests 


| Standard of care test | 


Positive result | Negative result | Total 


Experimental test | Positive result |35 [10 |45 
Negative result | 2 | 3 | 5 
Total [37 [13 [50 


Data are fictional 


The experimental test had a higher positivity rate, which may or 
may not indicate improvement over the standard of care test. A 
higher positivity rate could lead to a higher false positive rate or to 
additional cancer diagnoses. The meaning of additional cancer 
diagnoses is uncertain as well. They may represent cancers that 
are curable due to early detection, not curable regardless of early 
detection, or overdiagnosed. 

In Table 7.2, we see that 35 cancers were diagnosed after both 
a positive standard of care and experimental cancer screening test. 
An additional 10 cancers were diagnosed as a result of a positive 
experimental test, even though the standard of care test was nega- 
tive. We assume that these cancers were false negatives for the 
standard of care test, though we will never know what would have 
happened in the absence of the experimental test. 


7.6.3 Strengths and Weaknesses 


A two-in-one single-arm study may provide useful information if 
the standard of care test is known to reduce cause-specific mortal- 
ity and the experimental test appears to have increased sensitivity 
and positive predictive value (PPV). A demonstrated increase in 
those two performance measures is usually interpreted to mean 
that the new test is superior, but to make that leap, one must 
assume that more asymptomatic diagnoses will lead to a greater 
reduction in cause-specific mortality. The existence of overdiag- 
nosis and, for some cancers, equally efficacious treatment at a 
later stage, challenge that assumption. 
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A cancer screening two-in-one single-arm study cannot pro- 
vide definitive evidence of efficacy or effectiveness. In the 
instance of a test that has not yet disseminated, results are best 
used to make decisions regarding the need for an RCT. 


7.6.4 Examples of Two-in-One Single-Arm 
Studies 


Blood-based biomarker cancer screening tests are of particular 
interest in colorectal cancer screening as the available screening 
tests, lower endoscopy and fecal testing, are not palatable to many 
individuals. Testing for circulating methylated SEP9 DNA has 
been under consideration as a way to screen for colorectal cancer. 
To examine the performance of SEP9 testing, individuals who 
were scheduled for screening colonoscopy were invited to give 
blood plasma samples prior to their colonoscopy preparation reg- 
imen [13]. Performance measures for the SEP9 test were calcu- 
lated using the results and ultimate outcome of colonoscopy 
screening, the gold standard in both colorectal cancer screening 
and diagnosis. 

Screening mammography has evolved from the use of film- 
based to computer-based imaging. Film-based mammography 
only provides two-dimensional hard copy images. Digital mam- 
mography provides three-dimensional images that are read on a 
computer screen and can be manipulated to allow additional 
interpretation. The Digital Mammographic Imaging Screening 
Trial (DMIST) was designed to measure what were expected to 
be relatively small but potentially clinically important differ- 
ences in diagnostic accuracy between digital and film mammog- 
raphy [14]. Women enrolled in the trial received both tests on the 
same day, and each test was read by a different radiologist. 
Diagnostic evaluation was performed if either test was positive. 
Performance measures were calculated assuming that neither test 
was definitive. 
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7.7 All Study Designs: Critical Data Elements 


Most studies of cancer screening, regardless of study design, col- 
lect a large amount of data. Critical data elements are those that 
are necessary for proper assessment of screening performance, 
effectiveness, or efficacy. Other data elements may be collected 
for ancillary studies, or they may be collected for “what if” situa- 
tions, but their collection must not jeopardize the collection of the 
critical data elements. Resources, including participant good will 
and staff time, always are limited. 

Not every research endeavor will be able to collect every data 
element, even the critical elements. As a result, not every research 
endeavor will be able to answer every question. Even so, studies 
that do not collect every critical element may provide useful infor- 
mation, although the limitations of the research in the absence of 
such data must be clearly stated. The inability to collect most 
critical data elements should lead to questions about the value of 
the research. 

Critical data elements for individual-level studies include date 
of birth; receipt, date and results of cancer screening tests; diagno- 
sis of the cancer of interest; date of diagnosis; cancer characteris- 
tics (including stage, histology and location); age at death; date of 
death; cause of death. Indication for all relevant medical tests or 
procedures that are proximal to either the date of screen or the 
date of diagnosis should be collected to differentiate cancer 
screening from diagnostic evaluation. If that information is not 
available, any information that can be used to derive indication 
should be collected. Other valuable data elements include cancer 
treatment procedures, and adverse events of any medical proce- 
dure associated with cancer screening, diagnostic evaluation, or 
cancer treatment. Risk factors for the cancer of interest, as well as 
other potential confounders, should be collected, especially in 
observational studies. 

Ecologic studies of cancer screening require entity-level can- 
cer mortality rates and a metric of cancer screening use. One 
option for a test administered annually is to use the percent of 
residents who received a cancer screening test in the last 
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12 months. Other useful data elements include measures of cancer 
screening availability and characteristics of the entity that may 
predict cancer screening behavior, such as percent of residents 
with a college degree. 
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Cancer Prevention Screening 8 


Cancer screening for certain organs leads to detection of precan- 
cer. Detection of precancer is a form of cancer prevention if one 
uses the word cancer to exclusively mean invasive cancer, which 
is customary but not universal. Bretthauer and Kalager [1] have 
coined the phrase cancer prevention screening to refer to cancer 
screening practices that aim to detect precancer, and the phrase 
early detection cancer screening to refer to cancer screening prac- 
tices that aim to detect invasive cancer. 

Much of the theory and methodology regarding the assessment 
of cancer screening data arose during a time when the goal of 
cancer screening was to reduce cancer mortality by detection of 
invasive cancer at early stages. The reason for that goal was that 
technology was not advanced enough to detect precancer. It is fair 
to ask whether that theory and methodology still apply in our cur- 
rent era, one in which both invasive cancer and precancer disease 
are detected through cancer screening. It does, with one excep- 
tion: the interpretation of changes in cancer incidence. The 
remainder of the principles laid out in the first seven chapters also 
are applicable to cancer prevention screening. This chapter pres- 
ents material of relevance to cancer prevention screening for each 
of the first seven chapters of this primer. 
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8.1 Chapter 1: Foundations 


The NCI website mentioned in Chap. | defines precancerous as 
“a term used to describe a condition that may (or is likely to) 
become cancer. Also called premalignant” [2]. Most researchers 
use those terms as well as the term pre-invasive interchangeably. 
I prefer precancer because I find it to be broader in meaning than 
pre-malignant or pre-invasive. I use precancer to mean any 
change that is thought to be on the pathway to invasive cancer, be 
it DNA mutations in one cell or a tumor consisting of mutated 
cells that is on the verge of breaking through the basement mem- 
brane. In general, the material presented in Chapters 1, 2, 3, 4, 5, 
6, and 7 are relevant to whatever abnormality cancer screening 
aims to find. 

Cancer prevention screening will be of value if some precan- 
cer detected through cancer screening would have become inva- 
sive and ultimately fatal cancer in the absence of cancer 
screening. Detection of precancer that does not meet that desig- 
nation represents overdiagnosis. The definition of overdiagnosis 
can be modified slightly to be inclusive: screen-detected precan- 
cer or invasive cancer that never would have been diagnosed, 
either as precancer or invasive cancer, in the absence of cancer 
screening. 

The overarching goal of both early detection cancer screening 
and cancer prevention screening is to reduce cause-specific mor- 
tality. We should not, however, assume that cancer prevention 
screening is merely early detection cancer screening at a very 
early stage, and that the benefits would be more extensive and 
harms less extensive than detection at a later stage. Precancer, at 
the time of detection, is not life-threatening as it cannot metasta- 
size. Advances in technology have led to detection of more and 
more precancerous abnormalities with uncertain clinical rele- 
vance, creating quandaries for clinicians and patients. It is almost 
certain that overdiagnosis is more prevalent in cancer prevention 
screening as compared with early detection cancer screening. 
Even so, treatment of precancer has the potential to be less oner- 
ous than treatment of invasive cancer. 
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8.2 Chapter 2: Behind the Scenes 


Chapter 2 presented the four phase model (Fig. 2.1). The model 
did not incorporate invasiveness of disease as it is immaterial to 
its purpose: to classify the stages of the natural history of cancer 
at which an abnormality, invasive or not, could be detected at an 
asymptomatic stage through cancer screening. While immaterial 
to the purpose of the model, the invasiveness of an abnormality is 
not immaterial to the assessment of cancer screening. 


8.3 Chapter 3: Performance Measures 


The building blocks of performance measures were presented in 
Chap. 3 (Table 3.1); a revised version that includes precancer is 
presented here as Table 8.1. Note that Table 8.1 does not discrim- 
inate between positive test results that are suspicious for precan- 
cer and invasive cancer. Today’s cancer screening tests, with the 
exception of cervical cytology, do not have that level of discrimi- 
natory ability. It is questionable whether they should, as cancer 


Table 8.1 The building blocks of performance measures for cancer screen- 
ing tests that detect precancer and invasive cancer 


Truth 
Invasive | Precancer| Neither | Total 
cancer present present 
present | (Phase B) | (Phase A 
(Phase or no 
B) cancer) 
Screening | Positive | a; ap b aj+a,+b 
test result true true false 


invasive | precancer | positives 
positives | positives 


Negative | c; Cp d Ci+C +d 
false false true 
invasive | precancer | negatives 
negatives | negative 

Total ai + Ci a, + Cp b+d aj+a,+b+e; 

Kta. 
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screening is not intended to provide that degree of information 
about the nature of suspicious abnormalities. 

Performance measures for cancer screening tests that detect 
both precancer and invasive cancer can be calculated by combin- 
ing the two if measuring the complete impact and performance of 
the cancer screening test is desired. Calculations would be the 
same as in Chap. 3, with a equaling a; + a,, and c equaling c; + c,. 
Cells b and d do not change in this instance. The interpretations do 
not change, although to be as precise as possible it should be said, 
for example, that sensitivity is the percent of individuals with pre- 
cancer or invasive cancer who received a positive test, and that 
specificity is the percent of individuals with neither precancer nor 
invasive cancer who received a negative test. 

C, is somewhat of a theoretical quantity, as it is impossible to 
know whether a symptom-detected invasive cancer that is classi- 
fied as a false negative was, at the time of the screen, a precancer 
or an invasive cancer. It is uncommon for a precancer to be 
detected due to symptoms, but when that occur, it seems fair to 
count that precancer towards c,,. 

There are no hard and fast rules for calculating performance 
measures for precancer alone or invasive cancer alone when a 
cancer screening test detects both, though a compelling argu- 
ment can be made for calculating sensitivity simply as a,/ 
(a, + C,) in the instance of precancer and aj/(a; + c;) in the 
instance of invasive disease. For the other performance mea- 
sures, the calculations will depend on how the outcome that is 
not of interest is classified and whether it is even included. If we 
wish, for example, to calculate performance measures for inva- 
sive disease, we have two options: precancer diagnoses could be 
excluded entirely from calculations, or screens that are associ- 
ated with precancer diagnoses can be counted as false positives. 
Cells b and d are affected, which means that any performance 
measure that utilizes them will be different for the two methods. 
Both options return results of similar magnitude if precancer 
and invasive cancer are rare. 
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8.4 Chapter 4: Population Measures: 
Definitions 


The manner in which intermediate and definitive outcomes are 
calculated does not change. Incidence and case survival can be 
calculated for precancer and invasive cancer alone or combined. A 
category for precancer can be added to stage distributions. 
Mortality calculations will not change as they do not utilize diag- 
noses. 


8.5 Chapter 5: Population Measures: Cancer 
Screening’s Impact 


Recall from Chap. 5 that cancer screening that detects only inva- 
sive cancer will lead to an increase in invasive cancer incidence. 
Cancer screening that detects only precancer will lead to an 
increase in precancer. It also will lead to a decrease in invasive 
cancer incidence as long as not all precancer detected through 
cancer screening represents overdiagnosis. If a cancer screening 
test can detect both precancer and invasive cancer, the impact on 
invasive cancer incidence is difficult to predict. It will depend on 
many factors, including the ratio of precancer to invasive cancer 
detected through cancer screening, as well as the frequency of 
interval cancers and their stage (precancer or invasive). 

The other measures discussed in Chap. 5 will be affected as 
well, though none will “flip-flop” like cause-specific incidence. 
Consider, for example, case survival. Detection of invasive cancer 
inflates case survival, and detection of precancer inflates case sur- 
vival to even a greater degree, because precancer occurs earlier in 
the natural history of cancer. 

A reduction in invasive cancer incidence is accepted as a defin- 
itive outcome in the case of cervical cancer screening and colorec- 
tal cancer screening with colonoscopy. Far more cervical 
precancer is detected than invasive cervical cancer. Years of wide- 
spread cervical cancer screening combined with unique aspects of 
cervical cancer natural history have led to extremely low incidence 
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rates of invasive cervical cancer in much of the US. Screening 
with colonoscopy has led to a meaningful reduction in the number 
of invasive colorectal cancers, though its impact has yet to match 
that of cervical cancer screening. 

If cancer screening is of benefit, a reduction in invasive cancer 
incidence should be followed by a reduction in cause-specific 
mortality. If the former happens but not the latter, it is likely that 
detection at a precancerous stage offers no prognostic benefit 
compared with detection at an early invasive stage. Further dis- 
cussion of benefit in the absence of a cause-specific mortality 
reduction can be found in Chap. 9. 

The use of cancer incidence as a definitive outcome assumes 
that the benefit-to-harm ratio is similar or better for screen detec- 
tion of precancer relative to invasive cancer. That may not be the 
case: precancer, at the time of detection, is not life-threatening as 
it cannot metastasize. Unfortunately, population-based trends in 
detection of precancer either are not available or are based on 
incomplete ascertainment of the precancer that cancer screening 
can detect. That limits our ability to assess the entire impact of 
cancer screening, a serious issue given that detection of precancer 
through cancer screening is becoming a relatively common 
occurrence. 


8.6 Chapter 6: Experimental Research 
Designs 


All study designs described in Chap. 6 can be employed to inves- 
tigate cancer screening’s ability to reduce invasive cancer. 


8.7 Chapter 7: Observational Research 
Designs 


Case-control studies, the most complex of the study designs pre- 
sented in Chap. 7, need some modifications when detection of 
invasive disease is the outcome of interest [3]. 
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A case-control study to assess the ability of a cancer screening 
test to reduce invasive cancer utilizes cases, individuals who have 
been diagnosed with invasive cancer, and matched controls. 
Controls must be alive at the time of the case’s diagnosis and must 
not have been diagnosed with invasive cancer during the case’s 
exposure window, which is the time during which the case’s inva- 
sive cancer could have been detected through cancer screening as 
precancer. The exposure window must not include the time that 
the case’s cancer could have been screen-detected as invasive can- 
cer. Cancer screening activity for both cases and controls is 
assessed for the exposure window. 

Data elements that provide information on death usually are 
not needed for studies of cancer prevention screening, as death 
occurs after the definitive outcome of diagnosis. 


8.7.1 Example of a Case-Control Study of 
Cancer Screening with an Outcome of 
Invasive Disease 


Newcomb et al. examined the ability of screening sigmoidoscopy 
to reduce colorectal cancer incidence [4]. Cases and controls 
resided in one of three counties in Washington State. Cases were 
identified using the SEER Puget Sound cancer registry, were 
between ages 20 and 74, and newly diagnosed with invasive 
colorectal adenocarcinoma. Controls were randomly selected 
according to the age and sex distribution of the cases (frequency- 
matching) using Washington State driver’s license data (ages 
20-64 years) and Medicare files (65 years and older). The expo- 
sure window included only those tests performed more than 
1 year prior to diagnosis date (cases) or more than 1 year prior to 
interview date (controls). Information on cancer screening history 
was collected using structured telephone interviews. The authors 
present their findings separately for proximal and distal colorectal 
cancer to reflect the anatomy of the colorectum and the inability 
of the sigmoidoscope to reach the proximal colon. 
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Additional Considerations g 


The topics in this chapter are relevant to assessment of cancer 
screening data but did not have an obvious home in the earlier 
chapters of this primer. As you will see, they are quite varied in 
scope. Each falls in one of three categories: data interpretation, 
methodology, and policy. 


9.1 Topics Regarding Data Interpretation 
9.1.1 Number Needed to Screen 


Number needed to screen, or NNS, indicates how many individu- 
als need to be screened so that one fewer individual dies of the 
cancer of interest. NNS is only relevant if cancer screening 
reduces mortality. NNS estimates for cancer screening tests tend 
to be in the hundreds to thousands of individuals. For example, 
the NNS for lung cancer screening with low dose computed 
tomography (LDCT) calculated from the National Lung Screening 
Trial (NLST) data was 320 [1]. 

The first step in calculating NNS is to subtract the cause- 
specific mortality rate in the presence of cancer screening from 
the cause-specific mortality rate in the absence of cancer screen- 
ing. That quantity, which is a rate, is called the absolute risk 
reduction, and is an indication of extent of death prevented by 
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cancer screening. NNS equals the reciprocal of the absolute risk 
reduction. A fictional example is presented in Table 9.1. The abso- 
lute risk reduction in that table is 20 per 1000 person-years. The 
NNS is 1000/20, or 50. 

NNS is calculated assuming that the only factor that contrib- 
utes to the difference in mortality is cancer screening. It is best to 
use data from randomized controlled trials (RCTs), as data that 
come from other sources could reflect confounders of the screen- 
ing/cause-specific mortality relationship. 


9.1.2 Generalizability of Results 


Generalizability refers to the applicability of results from a study, 
experimental or observational, to groups other than the study par- 
ticipants. Issues of generalizability are what drive the need to 
assess effectiveness. A cancer screening test may be efficacious in 
an RCT, but its ability to be effective in a community setting is not 
guaranteed by that finding. 

Most cancer screening guidelines are based on findings of 
RCTs. Because cancer screening RCTs are long, large, and expen- 
sive undertakings, few are done. Not surprisingly, the urge to take 
the results of an RCT conducted in one population and apply them 
to another population is strong. The populations at hand could be 
dissimilar regions of one country, two countries in the same part 
of the world with different health care systems, or two countries 
far away from one another with dramatically different cultural 
norms. 


Table 9.1 Calculating number needed to screen (NNS) 


Person-years | Number who die of the | Cause-specific 


(PY) cancer of interest | mortality rate 
; Screened | 10,000 | 100 | 10 per 1000 PY 
Unscreened | 15,000 | 450 | 30 per 1000 PY 


NNS calculations: 30 per 1000 PY - 10 per 1000 PY = 20 per 1000 PY; 
1000/20 = 50 
NNS is 50 


Data are fictional 
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It should not be assumed that a beneficial effect of cancer 
screening seen in one population will be replicated in another 
population if the two populations have different risk factor pro- 
files. An example is lung cancer screening: the cancer screening 
process may not confer the same magnitude of benefit in asbestos 
workers, say, as it does in cigarette smokers. It is not wise to 
extrapolate results from one population to another if the two pop- 
ulations have different clinical practices, clinical resources, and 
access to health care. Low and middle income countries have 
begun to establish cancer screening programs based on experi- 
ence in high income countries, yet differences in medical 
resources, access to transportation, and rurality may not allow 
easy, frequent, or productive visits to cancer screening or treat- 
ment centers. Cultural norms also may impact cancer screening 
uptake and cancer treatment choices. 

The assumption that a null effect of cancer screening is gener- 
alizable from one population to another also can be unwise. A 
region with a preponderance of late-stage, untreatable cancers 
may benefit from cancer screening, whereas the same cancer 
screening practice may have little to no impact in a region where 
most patients have earlier stage disease for which treatment is 
available. 

Studies done in regions assumed to be similar enough to pro- 
duce comparable findings can and have produced conflicting 
results. The phenomenon has been observed in breast cancer 
screening, but the best example comes from prostate cancer 
screening. There are two notable RCTs of prostate cancer screen- 
ing: the Prostate, Lung, Colorectal and Ovarian Cancer Screening 
Trial (PLCO) [2] and the European Randomized Study of 
Screening for Prostate Cancer (ERSPC) [3]. PLCO, an RCT done 
in the US, found no reduction in prostate cancer mortality, while 
ERSPC, an RCT done in many countries in Europe, did. The two 
studies employed different cancer screening protocols, which 
may explain, at least in part, the discordant findings. Nevertheless, 
discussions regarding the conflicting results have focused on con- 
tamination in PLCO’s control arm and likely inferior prostate can- 
cer treatment in ERSPC’s control arm. Random variation or a 
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systematic difference (that is, the contamination and treatment 
issues) may very well be responsible, but it also is necessary to 
consider the possibility that prostate cancer screening may be of 
benefit in one region but not the other. 


9.1.3 Concurrent Changes in Treatment 


Cancer screening does not operate in a vacuum. While cancer 
screening tests are under investigation, disseminating, or their use 
reaches a steady state, changes in clinical practice are occurring as 
well. Advances have led to a better understanding of tumor com- 
position, which in turn have led to new and highly effective thera- 
pies for some tumors. Cures are possible today that were not 
possible 20 years ago. This situation begs this question: if cancer 
treatment has improved, especially at regional and distant stages, 
is screen detection at an early stage still necessary? 

In the presence of concurrent changes in treatment, an RCT 
can still evaluate whether cancer screening is of benefit as long as 
individuals in both arms have access to the same treatments. 
Concurrent changes do present a problem in time trend studies; it 
is impossible to know whether reductions in cancer mortality are 
due to uptake of a new cancer screening regimen or availability of 
a new treatment. 

An RCT to determine whether a cancer screening test affects a 
benefit cannot be established each time a shift in clinical practice 
occurs. Creative use of available data can shed some light, how- 
ever. The ecologic study of Autier et al. [4], mentioned in Chap. 7, 
examined the issue of concurrent changes in breast cancer screen- 
ing uptake and treatment by examining time trends for three pairs 
of regions in Europe. Each region in a pair had similar access to 
breast cancer treatment yet a different date of widespread mam- 
mography adoption. While not without limitations, that analysis 
suggests that recent reductions in breast cancer mortality are not 
overwhelmingly due to cancer screening. 


9.2 Topics Regarding Methodology 113 


9.2 Topics Regarding Methodology 
9.2.1 Microsimulation Modeling 


Microsimulation modeling of cancer screening is a technique in 
which computer-generated (fictional) life histories are manipu- 
lated by applying assumptions about factors that affect cancer 
screening outcomes. Models produce outcomes, such as cause- 
specific mortality, for a variety of assumptions and cancer screen- 
ing scenarios, providing insight into benefits and harms of cancer 
screening. The National Cancer Institute’s (NCI’s) Cancer 
Intervention and Surveillance Modeling Network (CISNET) ini- 
tiative has taken the lead in microsimulation modeling for cancer 
screening [5]. 

Microsimulation modeling is possible given unprecedented 
improvements in computational power in recent years. The use of 
microsimulation modeling in lieu of establishing RCTs has been 
suggested, because RCTs cannot address every proposed cancer 
screening strategy. Microsimulation modeling is arguably most 
valuable when done in conjunction with data from population- 
level databases, completed RCTs, or large, well-conducted pro- 
spective cohort studies, as certain assumptions needed to generate 
life histories can be based on real-life experience. 

No microsimulation model will perfectly replicate reality. 
However, these models have become a popular and useful tool to 
investigate “what if” situations. Results from CISNET models, in 
conjunction with RCT and cohort data, are now used by the 
United States Preventive Services Task Force [6] when develop- 
ing cancer screening recommendations . 


9.2.2 Magnitude of Overdiagnosis 


The excess incidence method was presented in Chap. 6 as a way 
to calculate the degree of overdiagnosis in an RCT, but it is not the 
only method available. Some methods employ assumptions about 
the distribution of lead time [7], while others compare changes in 
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incidence that have occurred over time, generally in conjunction 
with other factors [8, 9]. Statistical modeling, including micro- 
simulation modeling, has been utilized in the effort to determine 
the magnitude of overdiagnosis or a range of plausible magni- 
tudes. 

There has been heated discussion as to which method will pro- 
duce the correct answer. That assumes, of course, that there is one 
correct answer. But overdiagnosis only exists in the context of 
cancer screening, and therefore, the magnitude of overdiagnosis is 
a function of aspects of the cancer screening regimen, including 
test, screening interval, compliance, and those who are screened. 
Magnitude also is a function of the intensity of diagnostic evalua- 
tion that follows a positive test. There is no one correct answer; 
there are many correct answers, with each dependent on many 
factors. 

The desire to quantify the magnitude of overdiagnosis is 
related to the desire to weigh the benefits and harms of cancer 
screening, something that is most easily done when a single num- 
ber can be attached to each. In lieu of a single number, a range of 
plausible measures of overdiagnosis can be used in sensitivity 
analyses. 


9.2.3 Incidence and Prevalence Screens 


When discussing burden of disease, the terms prevalence and 
incidence refer to disease that is existing and new, respectively. 
The terms prevalence and incidence are sometimes used in cancer 
screening to describe the initial and later screens, respectively, 
performed as part of a cancer screening program or an RCT. The 
initial screen is expected to lead primarily to detection of cancers 
that have stalled in Phase B, while incidence screens are expected 
to lead primarily to detection of cancers that have moved into 
Phase B since the last cancer screening test. All other things being 
equal, the yield on prevalence screens is expected to be higher 
than the yield on incidence screens. Also, the prognosis for can- 
cers detected on the prevalence screen is expected to be more 
favorable than for those detected on incidence screens. 
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9.2.4 Interval Cancers 


Interval cancers often are considered failings of cancer screening, 
even though cancer screening is not designed or expected to lead 
to detection of every Phase B cancer. Some conditions that lead to 
interval cancers, for example, errors in test interpretation and 
missed screens, may be addressable, but it is unrealistic to believe 
that interval cancers can be eliminated. Interval cancers are a 
reminder of the limits of cancer screening. 

Cancer can be detected serendipitously, meaning that an unre- 
lated diagnostic medical test or procedure inadvertently finds an 
abnormality that is suspicious for cancer. An MRI performed to 
investigate back pain could identify a colonic mass, for example. 
Whether serendipitously detected cancers are interval cancers is 
open to debate. They do not arise from symptoms but they may 
have been missed on the previous organ-specific cancer screening 
test. 


9.3 Topics Regarding Policy 
9.3.1 Selecting a Cancer Screening Interval 


The phrase cancer screening interval refers to the time between 
screens. Though the choice of the screening interval should be 
based exclusively on the average length of Phase B and how vari- 
able it can be, historically, is has not. It is only recently that 
screening intervals have started to reflect the natural history of 
cancer. In the past, screening intervals were typically 1 year, prob- 
ably because cancer screening was associated with the practice of 
having an annual physical. 

The choice of screening interval will impact effectiveness and 
the magnitude of harms. It also will drive costs and availability of 
health care resources. Ideally, these factors are weighed in con- 
junction with knowledge of the natural history of cancer to arrive 
at a screening interval that affords benefit but does not strain a 
health care system. 
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9.3.2 De-implementation 


De-implementation refers to the reduction or cessation of a ser- 
vice provided by health care practitioners. Calls for de- 
implementation may be made when practices do not benefit 
patients, including when they are harmful or wasteful. The need 
for de-implementation may arise in the instance of adoption of a 
practice whose benefit is uncertain, or if a practice observed to be 
efficacious is not effective. A well-known instance of de- 
implementation is the reduction in prescribing of postmenopausal 
hormone therapy after users experienced an increase in breast 
cancer risk [10]. 

De-implementation has been discussed in the context of cancer 
screening for a number of reasons. Some cancer screening tests 
have become widely adopted in clinical practice without strong or 
direct evidence that their use reduces cause-specific mortality; 
some also have been adopted without complete understanding of 
the harms they cause. A notable example of the former is thyroid 
cancer screening. Low-cost ultrasound thyroid cancer screening 
became available in South Korea in the 1990’s even though the 
practice had never been evaluated in an RCT. Thyroid cancer inci- 
dence increased 15-fold from 1993 to 2011, although no change 
in thyroid cancer mortality occurred concurrently. In 2015, the 
Korean Committee for National Cancer Screening Guidelines 
issued a recommendation against thyroid cancer screening with 
ultrasonography for healthy individuals [11, 12]. 

De-implementation will result in reversal of the effects on 
intermediate outcomes described in Chap. 5. Incidence of inva- 
sive cancer (in the case of cancer screening that detects only inva- 
sive disease) and case survival will decrease, and assuming all 
else remains the same, should approach their pre-screening levels. 
The number of early stage cancers should decrease due to elimi- 
nation of overdiagnosis. The number of late stage cancers will not 
change if cancer screening did not result in down staging, and will 
increase if it did. 
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As it is for implementation, it is critical to track the changes in 
both intermediate and definitive outcomes during a period of 
cancer screening de-implementation. Both implementation and 
de-implementation are by necessity based on certain assumptions; 
therefore, the impact cannot be predicted. It is particularly impor- 
tant to watch for unexpected consequences, be they favorable or 
deleterious. 


9.3.3 Reduction in Advanced-Stage Cancer 


A reduction in advanced-stage cancer, usually distant cancer, has 
been suggested as a surrogate for cause-specific mortality. The 
push to use advanced-stage cancer has to do, at least in part, with 
the desire to obtain answers regarding the impact of cancer screen- 
ing without having to wait for a cause-specific mortality outcome. 
A reduction in the number of distant-stage cancers may be the 
best of the intermediate cancer screening outcomes in terms of 
correlation with reductions in cause-specific mortality, but it still 
does not reflect experience after diagnosis and does not measure 
how cancer screening alters length of life. 

Legitimate use of a reduction in distant-stage cancers as what 
is, in effect, a definitive endpoint requires that those cancers are 
fatal, and often they are. It also assumes that non-distant-stage 
cancers have a better prognosis, which in most situations they do. 
Yet consider a cancer that, in the absence of cancer screening, 
would be diagnosed at a distant stage, but in the presence of can- 
cer screening, is diagnosed at a regional stage. If the prognosis for 
regional stage cancer is the same as that of distant-stage cancer, 
no reduction in cause-specific mortality would occur even though 
the number of distant-stage cancers has decreased. 

If the day comes when cancer is no longer fatal even at a dis- 
tant stage, the goals of cancer screening will need to be reas- 
sessed. In the meantime, the choice of distant-stage disease as a 
definitive endpoint must be made carefully and on a situation-by- 
situation basis. 
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9.3.4 Benefit in the Absence of a Mortality 
Reduction 


Once upon a time there was no cancer screening in the US. When 
discussions regarding establishment of population-based cancer 
screening began in earnest, the proposed metric of benefit was a 
reduction in cause-specific mortality, as cancer was considered to 
be a life-threatening disease. Diagnoses often occurred at late 
stages and few, if any, effective treatments were available once 
cancer spread beyond the organ of origin. 

The first breast and colorectal cancer screening tests to become 
established in the US were shown to reduce cause-specific mortal- 
ity in at least one RCT. Those tests, film-screen mammography 
and guaiac-based fecal occult blood testing, have since been 
replaced with tests that are more technologically advanced: digital 
mammography and breast tomosynthesis, and fecal immuno- 
chemical testing, flexible sigmoidoscopy, and colonoscopy. Yet 
none of the replacement tests was vetted in a study that assessed 
cause-specific mortality prior to adoption. 

When replacement tests are adopted, it is done so under the 
assumption that the new test will confer the same or a greater 
reduction in cause-specific mortality as the test it is replacing. The 
replacement tests also have a characteristic that make them more 
desirable than the test they are replacing. They may have better 
performance measures, such as lower false positive rates, or they 
may be more acceptable to patients. They could be less expensive 
when all components of the screening process are considered. 

In my opinion, future cancer screening tests that target an 
organ for which no efficacious screening test exists only should be 
implemented in clinical practice when high-level evidence is 
available to support a reduction in cause-specific mortality. Others 
may feel differently. Some have argued that a shift to a stage at 
diagnosis that is simpler to treat is benefit enough, though the 
consequences that come with a cancer diagnosis earlier in time 
must not be ignored. Those include intense surveillance regimens, 
chemoprevention strategies, and psychological challenges for 
periods of time that are longer than those that would have occurred 
if cancer had been diagnosed later. 
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Whether it is appropriate to adopt replacement tests in clinical 
practice without formal vetting using a cause-specific mortality 
endpoint or another measure of the benefit to harm is a matter of 
the cancer at hand and differences in the replacement and original 
test. There are some instances in which a strong argument can and 
have been made for adoption without full knowledge about the 
impact on benefits and harms. Data are available to retrospec- 
tively support some of the decisions made regarding replacement, 
including the choice to adopt colonoscopy screening for colorec- 
tal cancer. 
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SEER data indicate that cancer incidence has risen slightly (10%), 
5-year cancer survival has risen substantially (40%), and cancer 
mortality has dropped modestly (25%) since 1975 [1]. An increase 
in incidence and 5-year case survival are difficult to interpret for 
reasons discussed in this primer. But a reduction in cancer mortal- 
ity of any magnitude is a success. 

President Nixon spoke of the conquest of cancer when he pro- 
posed the National Cancer Act in 1971 [2], and others have used 
similar war-like language over the years. Data from SEER, how- 
ever, suggest that we have not conquered cancer. We know much 
more about cancer today than we did in 1971, but we still do not 
seem to know enough to make a huge impact. Cancer “fads” have 
come and gone; some have and some haven’t made a lasting dif- 
ference. Chemoprevention has reduced the risk of breast cancer 
recurrence, and prevention of smoking initiation and smoking 
cessation have led to meaningful decreases in lung cancer inci- 
dence and mortality. On the other hand, autologous bone marrow 
transplant for breast cancer was used for a number of years to no 
avail. Our understanding of the relationship of diet and cancer is 
still poor. It was estimated that in 2018, 600,000 Americans would 
die from cancer. Though a small percentage of the population, the 
absolute number is large. 
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Where does cancer screening fit into the picture? It depends 
on who you ask. Most researchers believe that earlier detection 
due to cancer screening has led to some reduction in cancer 
mortality, though there is widespread disagreement as to the 
degree of its impact. There is general agreement, however, that 
cancer screening programs impact health care spending and 
availability of resources yet benefit only a few of those who are 
screened. There is less agreement, however, regarding what 
constitutes benefit and harm, and even less regarding the accept- 
able ratio of harm to benefit. Discussion of the complexities of 
cancer screening began to appear in some lay press publications 
about 20 years ago, yet the predominant feeling among indi- 
viduals in the general public is that cancer screening is impor- 
tant and worthwhile, and that cancer detection at the earliest 
stage can only lead to good. 

The forces that drive the availability of cancer screening and 
the choice to be screened are complex. So too are the issues that 
were covered in this primer. I believe, however, that most people, 
patients and clinicians alike, are educable, and the complexities of 
cancer screening need not be out of reach. I hope that this primer 
has helped you in your quest to understand. 
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